spark git commit: [SPARK-21250][WEB-UI] Add a url in the table of 'Running Executors' in worker page to visit job page.
Repository: spark Updated Branches: refs/heads/master d4107196d -> d913db16a [SPARK-21250][WEB-UI] Add a url in the table of 'Running Executors' in worker page to visit job page. ## What changes were proposed in this pull request? Add a url in the table of 'Running Executors' in worker page to visit job page. When I click URL of 'Name', the current page jumps to the job page. Of course this is only in the table of 'Running Executors'. This URL of 'Name' is in the table of 'Finished Executors' does not exist, the click will not jump to any page. fix before: ![1](https://user-images.githubusercontent.com/26266482/27679397-30ddc262-5ceb-11e7-839b-0889d1f42480.png) fix after: ![2](https://user-images.githubusercontent.com/26266482/27679405-3588ef12-5ceb-11e7-9756-0a93815cd698.png) ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. Author: guoxiaolongCloses #18464 from guoxiaolongzte/SPARK-21250. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d913db16 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d913db16 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d913db16 Branch: refs/heads/master Commit: d913db16a0de0983961f9d0c5f9b146be7226ac1 Parents: d410719 Author: guoxiaolong Authored: Mon Jul 3 13:31:01 2017 +0800 Committer: Wenchen Fan Committed: Mon Jul 3 13:31:01 2017 +0800 -- .../org/apache/spark/deploy/worker/ui/WorkerPage.scala | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d913db16/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala b/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala index 1ad9731..ea39b0d 100644 --- a/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala +++ b/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala @@ -23,8 +23,8 @@ import scala.xml.Node import org.json4s.JValue +import org.apache.spark.deploy.{ExecutorState, JsonProtocol} import org.apache.spark.deploy.DeployMessages.{RequestWorkerState, WorkerStateResponse} -import org.apache.spark.deploy.JsonProtocol import org.apache.spark.deploy.master.DriverState import org.apache.spark.deploy.worker.{DriverRunner, ExecutorRunner} import org.apache.spark.ui.{UIUtils, WebUIPage} @@ -112,7 +112,15 @@ private[ui] class WorkerPage(parent: WorkerWebUI) extends WebUIPage("") { ID: {executor.appId} - Name: {executor.appDesc.name} + Name: + { +if ({executor.state == ExecutorState.RUNNING} && executor.appDesc.appUiUrl.nonEmpty) { + {executor.appDesc.name} +} else { + {executor.appDesc.name} +} + } + User: {executor.appDesc.user} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-21282][TEST][2.0] Fix test failure in 2.0
Repository: spark Updated Branches: refs/heads/branch-2.0 44a97f70f -> 4229e1605 [SPARK-21282][TEST][2.0] Fix test failure in 2.0 ### What changes were proposed in this pull request? There is a test failure after backporting a fix from 2.2 to 2.0, because the automatically generated column names are different between 2.2 and 2.0 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/ This PR is to re-generate the result file. ### How was this patch tested? N/A Author: gatorsmileCloses #18506 from gatorsmile/fixFailure. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4229e160 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4229e160 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4229e160 Branch: refs/heads/branch-2.0 Commit: 4229e16058f355e90dc1d177563c21e88d412c2b Parents: 44a97f7 Author: gatorsmile Authored: Mon Jul 3 13:28:51 2017 +0800 Committer: Wenchen Fan Committed: Mon Jul 3 13:28:51 2017 +0800 -- .../src/test/resources/sql-tests/results/arithmetic.sql.out| 6 +++--- sql/core/src/test/resources/sql-tests/results/array.sql.out| 5 - 2 files changed, 7 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4229e160/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out index 23281c6..c2e9bd5 100644 --- a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out @@ -281,7 +281,7 @@ struct -- !query 34 select ceil(1234567890123456) -- !query 34 schema -struct +struct -- !query 34 output 1234567890123456 @@ -289,7 +289,7 @@ struct -- !query 35 select ceiling(1234567890123456) -- !query 35 schema -struct +struct -- !query 35 output 1234567890123456 @@ -329,7 +329,7 @@ struct -- !query 40 select floor(1234567890123456) -- !query 40 schema -struct +struct -- !query 40 output 1234567890123456 http://git-wip-us.apache.org/repos/asf/spark/blob/4229e160/sql/core/src/test/resources/sql-tests/results/array.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/array.sql.out b/sql/core/src/test/resources/sql-tests/results/array.sql.out index 499a3d5..981b250 100644 --- a/sql/core/src/test/resources/sql-tests/results/array.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/array.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 10 +-- Number of queries: 12 -- !query 0 @@ -124,6 +124,7 @@ struct org.apache.spark.sql.AnalysisException cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7 + -- !query 10 select sort_array(array('b', 'd'), cast(NULL as boolean)) -- !query 10 schema @@ -140,6 +142,7 @@ struct<> org.apache.spark.sql.AnalysisException cannot resolve 'sort_array(array('b', 'd'), CAST(NULL AS BOOLEAN))' due to data type mismatch: Sort order in second argument requires a boolean literal.; line 1 pos 7 + -- !query 11 select size(boolean_array), - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-18004][SQL] Make sure the date or timestamp related predicate can be pushed down to Oracle correctly
Repository: spark Updated Branches: refs/heads/master c19680be1 -> d4107196d [SPARK-18004][SQL] Make sure the date or timestamp related predicate can be pushed down to Oracle correctly ## What changes were proposed in this pull request? Move `compileValue` method in JDBCRDD to JdbcDialect, and override the `compileValue` method in OracleDialect to rewrite the Oracle-specific timestamp and date literals in where clause. ## How was this patch tested? An integration test has been added. Author: Rui ZhaAuthor: Zharui Closes #18451 from SharpRay/extend-compileValue-to-dialects. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4107196 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4107196 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4107196 Branch: refs/heads/master Commit: d4107196d59638845bd19da6aab074424d90ddaf Parents: c19680b Author: Rui Zha Authored: Sun Jul 2 17:37:47 2017 -0700 Committer: gatorsmile Committed: Sun Jul 2 17:37:47 2017 -0700 -- .../spark/sql/jdbc/OracleIntegrationSuite.scala | 45 .../execution/datasources/jdbc/JDBCRDD.scala| 35 +-- .../apache/spark/sql/jdbc/JdbcDialects.scala| 27 +++- .../apache/spark/sql/jdbc/OracleDialect.scala | 15 ++- 4 files changed, 95 insertions(+), 27 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d4107196/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala -- diff --git a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala index b2f0969..e14810a 100644 --- a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala +++ b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala @@ -223,4 +223,49 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo val types = rows(0).toSeq.map(x => x.getClass.toString) assert(types(1).equals("class java.sql.Timestamp")) } + + test("SPARK-18004: Make sure date or timestamp related predicate is pushed down correctly") { +val props = new Properties() +props.put("oracle.jdbc.mapDateToTimestamp", "false") + +val schema = StructType(Seq( + StructField("date_type", DateType, true), + StructField("timestamp_type", TimestampType, true) +)) + +val tableName = "test_date_timestamp_pushdown" +val dateVal = Date.valueOf("2017-06-22") +val timestampVal = Timestamp.valueOf("2017-06-22 21:30:07") + +val data = spark.sparkContext.parallelize(Seq( + Row(dateVal, timestampVal) +)) + +val dfWrite = spark.createDataFrame(data, schema) +dfWrite.write.jdbc(jdbcUrl, tableName, props) + +val dfRead = spark.read.jdbc(jdbcUrl, tableName, props) + +val millis = System.currentTimeMillis() +val dt = new java.sql.Date(millis) +val ts = new java.sql.Timestamp(millis) + +// Query Oracle table with date and timestamp predicates +// which should be pushed down to Oracle. +val df = dfRead.filter(dfRead.col("date_type").lt(dt)) + .filter(dfRead.col("timestamp_type").lt(ts)) + +val metadata = df.queryExecution.sparkPlan.metadata +// The "PushedFilters" part should be exist in Datafrome's +// physical plan and the existence of right literals in +// "PushedFilters" is used to prove that the predicates +// pushing down have been effective. +assert(metadata.get("PushedFilters").ne(None)) +assert(metadata("PushedFilters").contains(dt.toString)) +assert(metadata("PushedFilters").contains(ts.toString)) + +val row = df.collect()(0) +assert(row.getDate(0).equals(dateVal)) +assert(row.getTimestamp(1).equals(timestampVal)) + } } http://git-wip-us.apache.org/repos/asf/spark/blob/d4107196/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala index 2bdc432..0f53b5c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala @@ -17,12 +17,10 @@ package org.apache.spark.sql.execution.datasources.jdbc
spark git commit: [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data
Repository: spark Updated Branches: refs/heads/master c605fee01 -> c19680be1 [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data ## What changes were proposed in this pull request? This PR is to maintain API parity with changes made in SPARK-17498 to support a new option 'keep' in StringIndexer to handle unseen labels or NULL values with PySpark. Note: This is updated version of #17237 , the primary author of this PR is VinceShieh . ## How was this patch tested? Unit tests. Author: VinceShiehAuthor: Yanbo Liang Closes #18453 from yanboliang/spark-19852. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c19680be Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c19680be Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c19680be Branch: refs/heads/master Commit: c19680be1c532dded1e70edce7a981ba28af09ad Parents: c605fee Author: Yanbo Liang Authored: Sun Jul 2 16:17:03 2017 +0800 Committer: Yanbo Liang Committed: Sun Jul 2 16:17:03 2017 +0800 -- python/pyspark/ml/feature.py | 6 ++ python/pyspark/ml/tests.py | 21 + 2 files changed, 27 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c19680be/python/pyspark/ml/feature.py -- diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py index 77de1cc..25ad06f 100755 --- a/python/pyspark/ml/feature.py +++ b/python/pyspark/ml/feature.py @@ -2132,6 +2132,12 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid, "frequencyDesc, frequencyAsc, alphabetDesc, alphabetAsc.", typeConverter=TypeConverters.toString) +handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid data (unseen " + + "labels or NULL values). Options are 'skip' (filter out rows with " + + "invalid data), error (throw an error), or 'keep' (put invalid data " + + "in a special additional bucket, at index numLabels).", + typeConverter=TypeConverters.toString) + @keyword_only def __init__(self, inputCol=None, outputCol=None, handleInvalid="error", stringOrderType="frequencyDesc"): http://git-wip-us.apache.org/repos/asf/spark/blob/c19680be/python/pyspark/ml/tests.py -- diff --git a/python/pyspark/ml/tests.py b/python/pyspark/ml/tests.py index 17a3947..ffb8b0a 100755 --- a/python/pyspark/ml/tests.py +++ b/python/pyspark/ml/tests.py @@ -551,6 +551,27 @@ class FeatureTests(SparkSessionTestCase): for i in range(0, len(expected)): self.assertTrue(all(observed[i]["features"].toArray() == expected[i])) +def test_string_indexer_handle_invalid(self): +df = self.spark.createDataFrame([ +(0, "a"), +(1, "d"), +(2, None)], ["id", "label"]) + +si1 = StringIndexer(inputCol="label", outputCol="indexed", handleInvalid="keep", +stringOrderType="alphabetAsc") +model1 = si1.fit(df) +td1 = model1.transform(df) +actual1 = td1.select("id", "indexed").collect() +expected1 = [Row(id=0, indexed=0.0), Row(id=1, indexed=1.0), Row(id=2, indexed=2.0)] +self.assertEqual(actual1, expected1) + +si2 = si1.setHandleInvalid("skip") +model2 = si2.fit(df) +td2 = model2.transform(df) +actual2 = td2.select("id", "indexed").collect() +expected2 = [Row(id=0, indexed=0.0), Row(id=1, indexed=1.0)] +self.assertEqual(actual2, expected2) + class HasInducedError(Params): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-21260][SQL][MINOR] Remove the unused OutputFakerExec
Repository: spark Updated Branches: refs/heads/master 6beca9ce9 -> c605fee01 [SPARK-21260][SQL][MINOR] Remove the unused OutputFakerExec ## What changes were proposed in this pull request? OutputFakerExec was added long ago and is not used anywhere now so we should remove it. ## How was this patch tested? N/A Author: Xingbo JiangCloses #18473 from jiangxb1987/OutputFakerExec. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c605fee0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c605fee0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c605fee0 Branch: refs/heads/master Commit: c605fee01f180588ecb2f48710a7b84073bd3b9a Parents: 6beca9c Author: Xingbo Jiang Authored: Sun Jul 2 08:50:48 2017 +0100 Committer: Sean Owen Committed: Sun Jul 2 08:50:48 2017 +0100 -- .../spark/sql/execution/basicPhysicalOperators.scala | 11 --- 1 file changed, 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c605fee0/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala index f3ca839..2151c33 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala @@ -585,17 +585,6 @@ case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecN } /** - * A plan node that does nothing but lie about the output of its child. Used to spice a - * (hopefully structurally equivalent) tree from a different optimization sequence into an already - * resolved tree. - */ -case class OutputFakerExec(output: Seq[Attribute], child: SparkPlan) extends SparkPlan { - def children: Seq[SparkPlan] = child :: Nil - - protected override def doExecute(): RDD[InternalRow] = child.execute() -} - -/** * Physical plan for a subquery. */ case class SubqueryExec(name: String, child: SparkPlan) extends UnaryExecNode { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org