spark git commit: [SPARK-21250][WEB-UI] Add a url in the table of 'Running Executors' in worker page to visit job page.

2017-07-02 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master d4107196d -> d913db16a


[SPARK-21250][WEB-UI] Add a url in the table of 'Running Executors' in worker 
page to visit job page.

## What changes were proposed in this pull request?

Add a url in the table of 'Running Executors' in worker page to visit job page.

When I click URL of 'Name', the current page jumps to the job page. Of course 
this is only in the table of 'Running Executors'.

This URL of 'Name' is in the table of 'Finished Executors' does not exist, the 
click will not jump to any page.

fix before:
![1](https://user-images.githubusercontent.com/26266482/27679397-30ddc262-5ceb-11e7-839b-0889d1f42480.png)

fix after:
![2](https://user-images.githubusercontent.com/26266482/27679405-3588ef12-5ceb-11e7-9756-0a93815cd698.png)

## How was this patch tested?
manual tests

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: guoxiaolong 

Closes #18464 from guoxiaolongzte/SPARK-21250.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d913db16
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d913db16
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d913db16

Branch: refs/heads/master
Commit: d913db16a0de0983961f9d0c5f9b146be7226ac1
Parents: d410719
Author: guoxiaolong 
Authored: Mon Jul 3 13:31:01 2017 +0800
Committer: Wenchen Fan 
Committed: Mon Jul 3 13:31:01 2017 +0800

--
 .../org/apache/spark/deploy/worker/ui/WorkerPage.scala  | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d913db16/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala
index 1ad9731..ea39b0d 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala
@@ -23,8 +23,8 @@ import scala.xml.Node
 
 import org.json4s.JValue
 
+import org.apache.spark.deploy.{ExecutorState, JsonProtocol}
 import org.apache.spark.deploy.DeployMessages.{RequestWorkerState, 
WorkerStateResponse}
-import org.apache.spark.deploy.JsonProtocol
 import org.apache.spark.deploy.master.DriverState
 import org.apache.spark.deploy.worker.{DriverRunner, ExecutorRunner}
 import org.apache.spark.ui.{UIUtils, WebUIPage}
@@ -112,7 +112,15 @@ private[ui] class WorkerPage(parent: WorkerWebUI) extends 
WebUIPage("") {
   
 
   ID: {executor.appId}
-  Name: {executor.appDesc.name}
+  Name:
+  {
+if ({executor.state == ExecutorState.RUNNING} && 
executor.appDesc.appUiUrl.nonEmpty) {
+   {executor.appDesc.name}
+} else {
+  {executor.appDesc.name}
+}
+  }
+  
   User: {executor.appDesc.user}
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-21282][TEST][2.0] Fix test failure in 2.0

2017-07-02 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 44a97f70f -> 4229e1605


[SPARK-21282][TEST][2.0] Fix test failure in 2.0

### What changes were proposed in this pull request?

There is a test failure after backporting a fix from 2.2 to 2.0, because the 
automatically generated column names are different between 2.2 and 2.0
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/

This PR is to re-generate the result file.

### How was this patch tested?
N/A

Author: gatorsmile 

Closes #18506 from gatorsmile/fixFailure.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4229e160
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4229e160
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4229e160

Branch: refs/heads/branch-2.0
Commit: 4229e16058f355e90dc1d177563c21e88d412c2b
Parents: 44a97f7
Author: gatorsmile 
Authored: Mon Jul 3 13:28:51 2017 +0800
Committer: Wenchen Fan 
Committed: Mon Jul 3 13:28:51 2017 +0800

--
 .../src/test/resources/sql-tests/results/arithmetic.sql.out| 6 +++---
 sql/core/src/test/resources/sql-tests/results/array.sql.out| 5 -
 2 files changed, 7 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4229e160/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
--
diff --git a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out 
b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
index 23281c6..c2e9bd5 100644
--- a/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/arithmetic.sql.out
@@ -281,7 +281,7 @@ struct
 -- !query 34
 select ceil(1234567890123456)
 -- !query 34 schema
-struct
+struct
 -- !query 34 output
 1234567890123456
 
@@ -289,7 +289,7 @@ struct
 -- !query 35
 select ceiling(1234567890123456)
 -- !query 35 schema
-struct
+struct
 -- !query 35 output
 1234567890123456
 
@@ -329,7 +329,7 @@ struct
 -- !query 40
 select floor(1234567890123456)
 -- !query 40 schema
-struct
+struct
 -- !query 40 output
 1234567890123456
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4229e160/sql/core/src/test/resources/sql-tests/results/array.sql.out
--
diff --git a/sql/core/src/test/resources/sql-tests/results/array.sql.out 
b/sql/core/src/test/resources/sql-tests/results/array.sql.out
index 499a3d5..981b250 100644
--- a/sql/core/src/test/resources/sql-tests/results/array.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/array.sql.out
@@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 10
+-- Number of queries: 12
 
 
 -- !query 0
@@ -124,6 +124,7 @@ struct
 org.apache.spark.sql.AnalysisException
 cannot resolve 'sort_array(array('b', 'd'), '1')' due to data type mismatch: 
Sort order in second argument requires a boolean literal.; line 1 pos 7
 
+
 -- !query 10
 select sort_array(array('b', 'd'), cast(NULL as boolean))
 -- !query 10 schema
@@ -140,6 +142,7 @@ struct<>
 org.apache.spark.sql.AnalysisException
 cannot resolve 'sort_array(array('b', 'd'), CAST(NULL AS BOOLEAN))' due to 
data type mismatch: Sort order in second argument requires a boolean literal.; 
line 1 pos 7
 
+
 -- !query 11
 select
   size(boolean_array),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-18004][SQL] Make sure the date or timestamp related predicate can be pushed down to Oracle correctly

2017-07-02 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master c19680be1 -> d4107196d


[SPARK-18004][SQL] Make sure the date or timestamp related predicate can be 
pushed down to Oracle correctly

## What changes were proposed in this pull request?

Move `compileValue` method in JDBCRDD to JdbcDialect, and override the 
`compileValue` method in OracleDialect to rewrite the Oracle-specific timestamp 
and date literals in where clause.

## How was this patch tested?

An integration test has been added.

Author: Rui Zha 
Author: Zharui 

Closes #18451 from SharpRay/extend-compileValue-to-dialects.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4107196
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4107196
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4107196

Branch: refs/heads/master
Commit: d4107196d59638845bd19da6aab074424d90ddaf
Parents: c19680b
Author: Rui Zha 
Authored: Sun Jul 2 17:37:47 2017 -0700
Committer: gatorsmile 
Committed: Sun Jul 2 17:37:47 2017 -0700

--
 .../spark/sql/jdbc/OracleIntegrationSuite.scala | 45 
 .../execution/datasources/jdbc/JDBCRDD.scala| 35 +--
 .../apache/spark/sql/jdbc/JdbcDialects.scala| 27 +++-
 .../apache/spark/sql/jdbc/OracleDialect.scala   | 15 ++-
 4 files changed, 95 insertions(+), 27 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d4107196/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
--
diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
index b2f0969..e14810a 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
@@ -223,4 +223,49 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 val types = rows(0).toSeq.map(x => x.getClass.toString)
 assert(types(1).equals("class java.sql.Timestamp"))
   }
+
+  test("SPARK-18004: Make sure date or timestamp related predicate is pushed 
down correctly") {
+val props = new Properties()
+props.put("oracle.jdbc.mapDateToTimestamp", "false")
+
+val schema = StructType(Seq(
+  StructField("date_type", DateType, true),
+  StructField("timestamp_type", TimestampType, true)
+))
+
+val tableName = "test_date_timestamp_pushdown"
+val dateVal = Date.valueOf("2017-06-22")
+val timestampVal = Timestamp.valueOf("2017-06-22 21:30:07")
+
+val data = spark.sparkContext.parallelize(Seq(
+  Row(dateVal, timestampVal)
+))
+
+val dfWrite = spark.createDataFrame(data, schema)
+dfWrite.write.jdbc(jdbcUrl, tableName, props)
+
+val dfRead = spark.read.jdbc(jdbcUrl, tableName, props)
+
+val millis = System.currentTimeMillis()
+val dt = new java.sql.Date(millis)
+val ts = new java.sql.Timestamp(millis)
+
+// Query Oracle table with date and timestamp predicates
+// which should be pushed down to Oracle.
+val df = dfRead.filter(dfRead.col("date_type").lt(dt))
+  .filter(dfRead.col("timestamp_type").lt(ts))
+
+val metadata = df.queryExecution.sparkPlan.metadata
+// The "PushedFilters" part should be exist in Datafrome's
+// physical plan and the existence of right literals in
+// "PushedFilters" is used to prove that the predicates
+// pushing down have been effective.
+assert(metadata.get("PushedFilters").ne(None))
+assert(metadata("PushedFilters").contains(dt.toString))
+assert(metadata("PushedFilters").contains(ts.toString))
+
+val row = df.collect()(0)
+assert(row.getDate(0).equals(dateVal))
+assert(row.getTimestamp(1).equals(timestampVal))
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/d4107196/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
index 2bdc432..0f53b5c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
@@ -17,12 +17,10 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 

spark git commit: [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data

2017-07-02 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/master c605fee01 -> c19680be1


[SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle 
invalid data

## What changes were proposed in this pull request?
This PR is to maintain API parity with changes made in SPARK-17498 to support a 
new option
'keep' in StringIndexer to handle unseen labels or NULL values with PySpark.

Note: This is updated version of #17237 , the primary author of this PR is 
VinceShieh .
## How was this patch tested?
Unit tests.

Author: VinceShieh 
Author: Yanbo Liang 

Closes #18453 from yanboliang/spark-19852.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c19680be
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c19680be
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c19680be

Branch: refs/heads/master
Commit: c19680be1c532dded1e70edce7a981ba28af09ad
Parents: c605fee
Author: Yanbo Liang 
Authored: Sun Jul 2 16:17:03 2017 +0800
Committer: Yanbo Liang 
Committed: Sun Jul 2 16:17:03 2017 +0800

--
 python/pyspark/ml/feature.py |  6 ++
 python/pyspark/ml/tests.py   | 21 +
 2 files changed, 27 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c19680be/python/pyspark/ml/feature.py
--
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 77de1cc..25ad06f 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -2132,6 +2132,12 @@ class StringIndexer(JavaEstimator, HasInputCol, 
HasOutputCol, HasHandleInvalid,
 "frequencyDesc, frequencyAsc, alphabetDesc, 
alphabetAsc.",
 typeConverter=TypeConverters.toString)
 
+handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle 
invalid data (unseen " +
+  "labels or NULL values). Options are 'skip' (filter 
out rows with " +
+  "invalid data), error (throw an error), or 'keep' 
(put invalid data " +
+  "in a special additional bucket, at index 
numLabels).",
+  typeConverter=TypeConverters.toString)
+
 @keyword_only
 def __init__(self, inputCol=None, outputCol=None, handleInvalid="error",
  stringOrderType="frequencyDesc"):

http://git-wip-us.apache.org/repos/asf/spark/blob/c19680be/python/pyspark/ml/tests.py
--
diff --git a/python/pyspark/ml/tests.py b/python/pyspark/ml/tests.py
index 17a3947..ffb8b0a 100755
--- a/python/pyspark/ml/tests.py
+++ b/python/pyspark/ml/tests.py
@@ -551,6 +551,27 @@ class FeatureTests(SparkSessionTestCase):
 for i in range(0, len(expected)):
 self.assertTrue(all(observed[i]["features"].toArray() == 
expected[i]))
 
+def test_string_indexer_handle_invalid(self):
+df = self.spark.createDataFrame([
+(0, "a"),
+(1, "d"),
+(2, None)], ["id", "label"])
+
+si1 = StringIndexer(inputCol="label", outputCol="indexed", 
handleInvalid="keep",
+stringOrderType="alphabetAsc")
+model1 = si1.fit(df)
+td1 = model1.transform(df)
+actual1 = td1.select("id", "indexed").collect()
+expected1 = [Row(id=0, indexed=0.0), Row(id=1, indexed=1.0), Row(id=2, 
indexed=2.0)]
+self.assertEqual(actual1, expected1)
+
+si2 = si1.setHandleInvalid("skip")
+model2 = si2.fit(df)
+td2 = model2.transform(df)
+actual2 = td2.select("id", "indexed").collect()
+expected2 = [Row(id=0, indexed=0.0), Row(id=1, indexed=1.0)]
+self.assertEqual(actual2, expected2)
+
 
 class HasInducedError(Params):
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-21260][SQL][MINOR] Remove the unused OutputFakerExec

2017-07-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 6beca9ce9 -> c605fee01


[SPARK-21260][SQL][MINOR] Remove the unused OutputFakerExec

## What changes were proposed in this pull request?

OutputFakerExec was added long ago and is not used anywhere now so we should 
remove it.

## How was this patch tested?
N/A

Author: Xingbo Jiang 

Closes #18473 from jiangxb1987/OutputFakerExec.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c605fee0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c605fee0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c605fee0

Branch: refs/heads/master
Commit: c605fee01f180588ecb2f48710a7b84073bd3b9a
Parents: 6beca9c
Author: Xingbo Jiang 
Authored: Sun Jul 2 08:50:48 2017 +0100
Committer: Sean Owen 
Committed: Sun Jul 2 08:50:48 2017 +0100

--
 .../spark/sql/execution/basicPhysicalOperators.scala | 11 ---
 1 file changed, 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c605fee0/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
index f3ca839..2151c33 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
@@ -585,17 +585,6 @@ case class CoalesceExec(numPartitions: Int, child: 
SparkPlan) extends UnaryExecN
 }
 
 /**
- * A plan node that does nothing but lie about the output of its child.  Used 
to spice a
- * (hopefully structurally equivalent) tree from a different optimization 
sequence into an already
- * resolved tree.
- */
-case class OutputFakerExec(output: Seq[Attribute], child: SparkPlan) extends 
SparkPlan {
-  def children: Seq[SparkPlan] = child :: Nil
-
-  protected override def doExecute(): RDD[InternalRow] = child.execute()
-}
-
-/**
  * Physical plan for a subquery.
  */
 case class SubqueryExec(name: String, child: SparkPlan) extends UnaryExecNode {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org