[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84679/testReport)**
 for PR 19594 at commit 
[`e69e213`](https://github.com/apache/spark/commit/e69e21348b4cde2abaec9dbb46381caf1ed3a1a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19594
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19591
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84677/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19591
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19591
  
**[Test build #84677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84677/testReport)**
 for PR 19591 at commit 
[`ee4098b`](https://github.com/apache/spark/commit/ee4098bf108c8e919b41e392c7316271173e6dc2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19931
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19931
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84678/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19931
  
**[Test build #84678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84678/testReport)**
 for PR 19931 at commit 
[`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19843
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19843
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84675/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19843
  
**[Test build #84675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84675/testReport)**
 for PR 19843 at commit 
[`930c113`](https://github.com/apache/spark/commit/930c113886dd27e784b8d2c6844dd92d8cdaa5a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84672/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84672/testReport)**
 for PR 19717 at commit 
[`caf2206`](https://github.com/apache/spark/commit/caf22060f600b3b382e2e98b7ee5f0aacc165f2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-12-08 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19676#discussion_r155913190
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,17 @@ public static void main(String[] args) {
 KMeans kmeans = new KMeans().setK(2).setSeed(1L);
 KMeansModel model = kmeans.fit(dataset);
 
-// Evaluate clustering by computing Within Set Sum of Squared Errors.
-double WSSSE = model.computeCost(dataset);
-System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+// Make predictions
+Dataset predictions = model.transform(dataset);
+
+// Evaluate clustering by computing Silhouette score
+ClusteringEvaluator evaluator = new ClusteringEvaluator()
+  .setFeaturesCol("features")
+  .setPredictionCol("prediction")
--- End diff --

We use default values here, so it's not necessary to set them explicitly. 
We should keep examples as simple as possible. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14129: [SPARK-16280][SQL] Implement histogram_numeric SQL funct...

2017-12-08 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14129
  
Is this pr available?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19931
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19931
  
**[Test build #84678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84678/testReport)**
 for PR 19931 at commit 
[`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19591
  
**[Test build #84677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84677/testReport)**
 for PR 19591 at commit 
[`ee4098b`](https://github.com/apache/spark/commit/ee4098bf108c8e919b41e392c7316271173e6dc2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19931
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84673/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19931
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19931
  
**[Test build #84673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84673/testReport)**
 for PR 19931 at commit 
[`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19594
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19594
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84676/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84676/testReport)**
 for PR 19594 at commit 
[`e69e213`](https://github.com/apache/spark/commit/e69e21348b4cde2abaec9dbb46381caf1ed3a1a4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19591
  
Looks like a legitimate flaky test. Will take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84671/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84671/testReport)**
 for PR 19829 at commit 
[`96df5f2`](https://github.com/apache/spark/commit/96df5f26d163a4a17d8ab824995b57992afa6b8b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MessageWithHeader extends AbstractFileRegion `
  * `  static class EncryptedMessage extends AbstractFileRegion `
  * `public abstract class AbstractFileRegion extends 
AbstractReferenceCounted implements FileRegion `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19715
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84674/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19715
  
**[Test build #84674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84674/testReport)**
 for PR 19715 at commit 
[`445bd84`](https://github.com/apache/spark/commit/445bd84a6e5e81896d5c94ada7035b00e2c22337).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-12-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19769


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-12-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19769
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19769
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19769
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84667/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19769
  
**[Test build #84667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84667/testReport)**
 for PR 19769 at commit 
[`1ea75c0`](https://github.com/apache/spark/commit/1ea75c0a8f2c5fed33b2a6d6102ad1d8bdf73906).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19751
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19751
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84665/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19751
  
**[Test build #84665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84665/testReport)**
 for PR 19751 at commit 
[`2606fcd`](https://github.com/apache/spark/commit/2606fcd6493ce7a57f3555c2613d43f1a0391bf7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-08 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19594#discussion_r155910267
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
 ---
@@ -67,6 +68,205 @@ class JoinEstimationSuite extends 
StatsEstimationTestBase {
 rowCount = 2,
 attributeStats = AttributeMap(Seq("key-1-2", 
"key-2-3").map(nameToColInfo)))
 
+  private def estimateByHistogram(
+  histogram1: Histogram,
+  histogram2: Histogram,
+  expectedMin: Double,
+  expectedMax: Double,
+  expectedNdv: Long,
+  expectedRows: Long): Unit = {
+val col1 = attr("key1")
+val col2 = attr("key2")
+val c1 = generateJoinChild(col1, histogram1, expectedMin, expectedMax)
+val c2 = generateJoinChild(col2, histogram2, expectedMin, expectedMax)
+
+val c1JoinC2 = Join(c1, c2, Inner, Some(EqualTo(col1, col2)))
+val c2JoinC1 = Join(c2, c1, Inner, Some(EqualTo(col2, col1)))
+val expectedStatsAfterJoin = Statistics(
+  sizeInBytes = expectedRows * (8 + 2 * 4),
+  rowCount = Some(expectedRows),
+  attributeStats = AttributeMap(Seq(
+col1 -> c1.stats.attributeStats(col1).copy(
+  distinctCount = expectedNdv, min = Some(expectedMin), max = 
Some(expectedMax)),
+col2 -> c2.stats.attributeStats(col2).copy(
+  distinctCount = expectedNdv, min = Some(expectedMin), max = 
Some(expectedMax
+)
+
+// Join order should not affect estimation result.
+Seq(c1JoinC2, c2JoinC1).foreach { join =>
+  assert(join.stats == expectedStatsAfterJoin)
+}
+  }
+
+  private def generateJoinChild(
+  col: Attribute,
+  histogram: Histogram,
+  expectedMin: Double,
+  expectedMax: Double): LogicalPlan = {
+val colStat = inferColumnStat(histogram)
+val t = StatsTestPlan(
+  outputList = Seq(col),
+  rowCount = (histogram.height * histogram.bins.length).toLong,
+  attributeStats = AttributeMap(Seq(col -> colStat)))
+
+val filterCondition = new ArrayBuffer[Expression]()
+if (expectedMin > colStat.min.get.toString.toDouble) {
+  filterCondition += GreaterThanOrEqual(col, Literal(expectedMin))
+}
+if (expectedMax < colStat.max.get.toString.toDouble) {
+  filterCondition += LessThanOrEqual(col, Literal(expectedMax))
+}
+if (filterCondition.isEmpty) t else 
Filter(filterCondition.reduce(And), t)
+  }
+
+  private def inferColumnStat(histogram: Histogram): ColumnStat = {
+var ndv = 0L
+for (i <- histogram.bins.indices) {
+  val bin = histogram.bins(i)
+  if (i == 0 || bin.hi != histogram.bins(i - 1).hi) {
+ndv += bin.ndv
+  }
+}
+ColumnStat(distinctCount = ndv, min = Some(histogram.bins.head.lo),
+  max = Some(histogram.bins.last.hi), nullCount = 0, avgLen = 4, 
maxLen = 4,
+  histogram = Some(histogram))
+  }
+
+  test("equi-height histograms: a bin is contained by another one") {
+val histogram1 = Histogram(height = 300, Array(
+  HistogramBin(lo = 10, hi = 30, ndv = 10), HistogramBin(lo = 30, hi = 
60, ndv = 30)))
+val histogram2 = Histogram(height = 100, Array(
+  HistogramBin(lo = 0, hi = 50, ndv = 50), HistogramBin(lo = 50, hi = 
100, ndv = 40)))
+// test bin trimming
+val (t1, h1) = trimBin(histogram2.bins(0), height = 100, min = 10, max 
= 60)
+assert(t1 == HistogramBin(lo = 10, hi = 50, ndv = 40) && h1 == 80)
+val (t2, h2) = trimBin(histogram2.bins(1), height = 100, min = 10, max 
= 60)
+assert(t2 == HistogramBin(lo = 50, hi = 60, ndv = 8) && h2 == 20)
+
+val expectedRanges = Seq(
+  OverlappedRange(10, 30, math.min(10, 40*1/2), math.max(10, 40*1/2), 
300, 80*1/2),
+  OverlappedRange(30, 50, math.min(30*2/3, 40*1/2), math.max(30*2/3, 
40*1/2), 300*2/3, 80*1/2),
+  OverlappedRange(50, 60, math.min(30*1/3, 8), math.max(30*1/3, 8), 
300*1/3, 20)
+)
+assert(expectedRanges.equals(
+  getOverlappedRanges(histogram1, histogram2, newMin = 10D, newMax = 
60D)))
+
+estimateByHistogram(
+  histogram1 = histogram1,
+  histogram2 = histogram2,
+  expectedMin = 10D,
+  expectedMax = 60D,
+  // 10 + 20 + 8
+  expectedNdv = 38L,
+  // 300*40/20 + 200*40/20 + 100*20/10
+  expectedRows = 1200L)
+  }
+
+  test("equi-height histograms: a bin has only one value") {
+val histogram1 = Histogram(height = 300, Array(
+  HistogramBin(lo = 30, hi = 30, ndv = 1), HistogramBin(lo = 30, hi = 
60, ndv = 30)))
+val histogram2 = Histogram(height = 100, Array(
+

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-08 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19594#discussion_r155910232
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
 ---
@@ -67,6 +68,205 @@ class JoinEstimationSuite extends 
StatsEstimationTestBase {
 rowCount = 2,
 attributeStats = AttributeMap(Seq("key-1-2", 
"key-2-3").map(nameToColInfo)))
 
+  private def estimateByHistogram(
+  histogram1: Histogram,
+  histogram2: Histogram,
+  expectedMin: Double,
+  expectedMax: Double,
+  expectedNdv: Long,
+  expectedRows: Long): Unit = {
+val col1 = attr("key1")
+val col2 = attr("key2")
+val c1 = generateJoinChild(col1, histogram1, expectedMin, expectedMax)
+val c2 = generateJoinChild(col2, histogram2, expectedMin, expectedMax)
+
+val c1JoinC2 = Join(c1, c2, Inner, Some(EqualTo(col1, col2)))
+val c2JoinC1 = Join(c2, c1, Inner, Some(EqualTo(col2, col1)))
+val expectedStatsAfterJoin = Statistics(
+  sizeInBytes = expectedRows * (8 + 2 * 4),
+  rowCount = Some(expectedRows),
+  attributeStats = AttributeMap(Seq(
+col1 -> c1.stats.attributeStats(col1).copy(
+  distinctCount = expectedNdv, min = Some(expectedMin), max = 
Some(expectedMax)),
+col2 -> c2.stats.attributeStats(col2).copy(
+  distinctCount = expectedNdv, min = Some(expectedMin), max = 
Some(expectedMax
+)
+
+// Join order should not affect estimation result.
+Seq(c1JoinC2, c2JoinC1).foreach { join =>
+  assert(join.stats == expectedStatsAfterJoin)
+}
+  }
+
+  private def generateJoinChild(
+  col: Attribute,
+  histogram: Histogram,
+  expectedMin: Double,
+  expectedMax: Double): LogicalPlan = {
+val colStat = inferColumnStat(histogram)
+val t = StatsTestPlan(
+  outputList = Seq(col),
+  rowCount = (histogram.height * histogram.bins.length).toLong,
+  attributeStats = AttributeMap(Seq(col -> colStat)))
+
+val filterCondition = new ArrayBuffer[Expression]()
+if (expectedMin > colStat.min.get.toString.toDouble) {
+  filterCondition += GreaterThanOrEqual(col, Literal(expectedMin))
+}
+if (expectedMax < colStat.max.get.toString.toDouble) {
+  filterCondition += LessThanOrEqual(col, Literal(expectedMax))
+}
+if (filterCondition.isEmpty) t else 
Filter(filterCondition.reduce(And), t)
+  }
+
+  private def inferColumnStat(histogram: Histogram): ColumnStat = {
+var ndv = 0L
+for (i <- histogram.bins.indices) {
+  val bin = histogram.bins(i)
+  if (i == 0 || bin.hi != histogram.bins(i - 1).hi) {
+ndv += bin.ndv
+  }
+}
+ColumnStat(distinctCount = ndv, min = Some(histogram.bins.head.lo),
+  max = Some(histogram.bins.last.hi), nullCount = 0, avgLen = 4, 
maxLen = 4,
+  histogram = Some(histogram))
+  }
+
+  test("equi-height histograms: a bin is contained by another one") {
+val histogram1 = Histogram(height = 300, Array(
+  HistogramBin(lo = 10, hi = 30, ndv = 10), HistogramBin(lo = 30, hi = 
60, ndv = 30)))
+val histogram2 = Histogram(height = 100, Array(
+  HistogramBin(lo = 0, hi = 50, ndv = 50), HistogramBin(lo = 50, hi = 
100, ndv = 40)))
+// test bin trimming
+val (t1, h1) = trimBin(histogram2.bins(0), height = 100, min = 10, max 
= 60)
+assert(t1 == HistogramBin(lo = 10, hi = 50, ndv = 40) && h1 == 80)
+val (t2, h2) = trimBin(histogram2.bins(1), height = 100, min = 10, max 
= 60)
+assert(t2 == HistogramBin(lo = 50, hi = 60, ndv = 8) && h2 == 20)
+
+val expectedRanges = Seq(
+  OverlappedRange(10, 30, math.min(10, 40*1/2), math.max(10, 40*1/2), 
300, 80*1/2),
+  OverlappedRange(30, 50, math.min(30*2/3, 40*1/2), math.max(30*2/3, 
40*1/2), 300*2/3, 80*1/2),
+  OverlappedRange(50, 60, math.min(30*1/3, 8), math.max(30*1/3, 8), 
300*1/3, 20)
+)
+assert(expectedRanges.equals(
+  getOverlappedRanges(histogram1, histogram2, newMin = 10D, newMax = 
60D)))
+
+estimateByHistogram(
+  histogram1 = histogram1,
+  histogram2 = histogram2,
+  expectedMin = 10D,
+  expectedMax = 60D,
+  // 10 + 20 + 8
+  expectedNdv = 38L,
+  // 300*40/20 + 200*40/20 + 100*20/10
+  expectedRows = 1200L)
+  }
+
+  test("equi-height histograms: a bin has only one value") {
+val histogram1 = Histogram(height = 300, Array(
+  HistogramBin(lo = 30, hi = 30, ndv = 1), HistogramBin(lo = 30, hi = 
60, ndv = 30)))
+val histogram2 = Histogram(height = 100, Array(
+

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19594
  
**[Test build #84676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84676/testReport)**
 for PR 19594 at commit 
[`e69e213`](https://github.com/apache/spark/commit/e69e21348b4cde2abaec9dbb46381caf1ed3a1a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19843
  
**[Test build #84675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84675/testReport)**
 for PR 19843 at commit 
[`930c113`](https://github.com/apache/spark/commit/930c113886dd27e784b8d2c6844dd92d8cdaa5a2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.con...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19931
  
**[Test build #84673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84673/testReport)**
 for PR 19931 at commit 
[`b4b1122`](https://github.com/apache/spark/commit/b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19715
  
**[Test build #84674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84674/testReport)**
 for PR 19715 at commit 
[`445bd84`](https://github.com/apache/spark/commit/445bd84a6e5e81896d5c94ada7035b00e2c22337).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19931: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `sp...

2017-12-08 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19931

[SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.conf`

## What changes were proposed in this pull request?

During https://github.com/apache/spark/pull/19882, `conf` is mistakenly 
used to switch ORC implementation between `native` and `hive`. To affect 
`OrcTest`, `spark.conf` should be used.

## How was this patch tested?

Pass the tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-22672-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19931.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19931


commit b4b1122b859f7fe8bf8b5ecd9bacbe0a3de0b9ea
Author: Dongjoon Hyun 
Date:   2017-12-08T22:17:48Z

[SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.conf`




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84664/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19925
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19591
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84666/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19591
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19925
  
**[Test build #84664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84664/testReport)**
 for PR 19925 at commit 
[`4d166de`](https://github.com/apache/spark/commit/4d166ded90b071332c42704070e98e581fa92042).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19591
  
**[Test build #84666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84666/testReport)**
 for PR 19591 at commit 
[`8013766`](https://github.com/apache/spark/commit/8013766d730b9fa14b9d0c71d527dfcfcead8af1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case...

2017-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/19930


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-12-08 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/19715
  
@MLnick Thank you very much for your comments! I will change these. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/19717
  
@vanzin Created https://issues.apache.org/jira/browse/SPARK-22743 to track 
the work on consolidating the common logic for handling driver and executor 
memory overhead. Addressed other comments in  
https://github.com/apache/spark/pull/19717/commits/caf22060f600b3b382e2e98b7ee5f0aacc165f2d.
 Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84672/testReport)**
 for PR 19717 at commit 
[`caf2206`](https://github.com/apache/spark/commit/caf22060f600b3b382e2e98b7ee5f0aacc165f2d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155908121
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +117,46 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .checkValue(interval => interval > 0, s"Logging interval must be a 
positive time value.")
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155908117
  
--- Diff: docs/configuration.md ---
@@ -157,13 +157,31 @@ of the most common options to set are:
 or in your default properties file.
   
 
+
+  spark.driver.memoryOverhead
+  driverMemory * 0.10, with minimum of 384 
+  
+The amount of off-heap memory (in megabytes) to be allocated per 
driver in cluster mode. This is
+memory that accounts for things like VM overheads, interned strings, 
other native overheads, etc.
+This tends to grow with the container size (typically 6-10%).
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17702#discussion_r155906572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -668,4 +672,31 @@ object DataSource extends Logging {
 }
 globPath
   }
+
+  /**
+   * Return all paths represented by the wildcard string.
+   * Follow [[InMemoryFileIndex]].bulkListLeafFile and reuse the conf.
+   */
+  private def getGlobbedPaths(
+  sparkSession: SparkSession,
+  fs: FileSystem,
+  hadoopConf: SerializableConfiguration,
+  qualified: Path): Seq[Path] = {
+val paths = SparkHadoopUtil.get.expandGlobPath(fs, qualified)
+if (paths.size <= 
sparkSession.sessionState.conf.parallelPartitionDiscoveryThreshold) {
+  SparkHadoopUtil.get.globPathIfNecessary(fs, qualified)
+} else {
+  val parallelPartitionDiscoveryParallelism =
+
sparkSession.sessionState.conf.parallelPartitionDiscoveryParallelism
+  val numParallelism = Math.min(paths.size, 
parallelPartitionDiscoveryParallelism)
+  val expanded = sparkSession.sparkContext
--- End diff --

Why do this using a Spark job, instead of just a local thread pool?

I see this is the same thing done by `InMemoryFileIndex`, but it feels 
unnecessarily expensive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84671/testReport)**
 for PR 19829 at commit 
[`96df5f2`](https://github.com/apache/spark/commit/96df5f26d163a4a17d8ab824995b57992afa6b8b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84670/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84670/testReport)**
 for PR 19829 at commit 
[`c7cffe9`](https://github.com/apache/spark/commit/c7cffe9f284762432d1d320845c60f8586f434af).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MessageWithHeader extends AbstractFileRegion `
  * `  static class EncryptedMessage extends AbstractFileRegion `
  * `public abstract class AbstractFileRegion extends 
AbstractReferenceCounted implements FileRegion `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84670/testReport)**
 for PR 19829 at commit 
[`c7cffe9`](https://github.com/apache/spark/commit/c7cffe9f284762432d1d320845c60f8586f434af).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84669/testReport)**
 for PR 19829 at commit 
[`9a07d2b`](https://github.com/apache/spark/commit/9a07d2b64c506a6d822dea27ca76255c084569f4).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84669/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19829#discussion_r155904309
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java
 ---
@@ -203,6 +203,63 @@ public long transfered() {
   return transferred;
 }
 
+@Override
--- End diff --

Good point! Updated!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84669/testReport)**
 for PR 19829 at commit 
[`9a07d2b`](https://github.com/apache/spark/commit/9a07d2b64c506a6d822dea27ca76255c084569f4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84668/testReport)**
 for PR 19829 at commit 
[`94df9ae`](https://github.com/apache/spark/commit/94df9ae0bf5216f70c17b8aed297cda22f9566f4).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MessageWithHeader extends AbstractFileRegion `
  * `  static class EncryptedMessage extends AbstractFileRegion `
  * `public abstract class AbstractFileRegion extends 
AbstractReferenceCounted implements FileRegion `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84668/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19829
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19829: [WIP]Upgrade Netty to 4.1.17

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19829
  
**[Test build #84668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84668/testReport)**
 for PR 19829 at commit 
[`94df9ae`](https://github.com/apache/spark/commit/94df9ae0bf5216f70c17b8aed297cda22f9566f4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case assump...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19930
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case assump...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19930
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84662/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19930: [SPARK-22279][SQL][FOLLOWUP] Preserve a test case assump...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19930
  
**[Test build #84662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84662/testReport)**
 for PR 19930 at commit 
[`2449812`](https://github.com/apache/spark/commit/244981250ecfed35c53db96740284bcdb83fa0db).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155900585
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +117,46 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .checkValue(interval => interval > 0, s"Logging interval must be a 
positive time value.")
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
--- End diff --

nit: `private[spark]` is redundant in this object.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155900650
  
--- Diff: docs/configuration.md ---
@@ -157,13 +157,31 @@ of the most common options to set are:
 or in your default properties file.
   
 
+
+  spark.driver.memoryOverhead
+  driverMemory * 0.10, with minimum of 384 
+  
+The amount of off-heap memory (in megabytes) to be allocated per 
driver in cluster mode. This is
+memory that accounts for things like VM overheads, interned strings, 
other native overheads, etc.
+This tends to grow with the container size (typically 6-10%).
--- End diff --

Should mention that not all cluster managers support this option, since 
this is now in the common configuration doc. Same below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19769: [SPARK-12297][SQL] Adjust timezone for int96 data from i...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19769
  
**[Test build #84667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84667/testReport)**
 for PR 19769 at commit 
[`1ea75c0`](https://github.com/apache/spark/commit/1ea75c0a8f2c5fed33b2a6d6102ad1d8bdf73906).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-12-08 Thread henryr
Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19769#discussion_r155870601
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -363,9 +370,25 @@ class ParquetFileFormat
   fileSplit.getLocations,
   null)
 
+  val sharedConf = broadcastedHadoopConf.value.value
+  // PARQUET_INT96_TIMESTAMP_CONVERSION says to apply timezone 
conversions to int96 timestamps'
+  // *only* if the file was created by something other than 
"parquet-mr", so check the actual
+  // writer here for this file.  We have to do this per-file, as each 
file in the table may
+  // have different writers.
+  def isCreatedByParquetMr(): Boolean = {
+val footer = ParquetFileReader.readFooter(sharedConf, 
fileSplit.getPath, SKIP_ROW_GROUPS)
+footer.getFileMetaData().getCreatedBy().startsWith("parquet-mr")
+  }
+  val convertTz =
+if (timestampConversion && !isCreatedByParquetMr()) {
+  
Some(DateTimeUtils.getTimeZone(sharedConf.get(SQLConf.SESSION_LOCAL_TIMEZONE.key)))
+} else {
+  None
+}
+
   val attemptId = new TaskAttemptID(new TaskID(new JobID(), 
TaskType.MAP, 0), 0)
   val hadoopAttemptContext =
-new TaskAttemptContextImpl(broadcastedHadoopConf.value.value, 
attemptId)
+new TaskAttemptContextImpl(broadcastedHadoopConf.value.value, 
attemptId);
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19769: [SPARK-12297][SQL] Adjust timezone for int96 data...

2017-12-08 Thread henryr
Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19769#discussion_r155870513
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
 ---
@@ -105,10 +112,19 @@
*/
   private final MemoryMode MEMORY_MODE;
 
-  public VectorizedParquetRecordReader(boolean useOffHeap) {
+  public VectorizedParquetRecordReader(TimeZone convertTz, boolean 
useOffHeap) {
+this.convertTz = convertTz;
 MEMORY_MODE = useOffHeap ? MemoryMode.OFF_HEAP : MemoryMode.ON_HEAP;
   }
 
+  public VectorizedParquetRecordReader(boolean useOffHeap) {
+this(null, useOffHeap);
+  }
+
+  VectorizedParquetRecordReader() {
--- End diff --

Removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155898310
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

Ah, I may have forgotten to actually make the change. Anyway, It's done now 

[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19929
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84660/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19929
  
**[Test build #84660 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84660/testReport)**
 for PR 19929 at commit 
[`6187d5a`](https://github.com/apache/spark/commit/6187d5a0df7c409a49cd636eb74dea9323044c6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84658/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84658/testReport)**
 for PR 19717 at commit 
[`7d2b303`](https://github.com/apache/spark/commit/7d2b30373b2e4d8d5311e10c3f9a62a2d900d568).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155896644
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

I don't follow. The code still says `Option.empty` when Matt asked it to be 
replace with `None`.


---

-

[GitHub] spark issue #19751: [SPARK-20653][core] Add cleaning of old elements from th...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19751
  
**[Test build #84665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84665/testReport)**
 for PR 19751 at commit 
[`2606fcd`](https://github.com/apache/spark/commit/2606fcd6493ce7a57f3555c2613d43f1a0391bf7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19591
  
**[Test build #84666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84666/testReport)**
 for PR 19591 at commit 
[`8013766`](https://github.com/apache/spark/commit/8013766d730b9fa14b9d0c71d527dfcfcead8af1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19925: [SPARK-22732] Add Structured Streaming APIs to DataSourc...

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19925
  
**[Test build #84664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84664/testReport)**
 for PR 19925 at commit 
[`4d166de`](https://github.com/apache/spark/commit/4d166ded90b071332c42704070e98e581fa92042).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-08 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19843
  
LGTM, but I'll wait for the PR title & description updates to merge this.  
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-12-08 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/19843#discussion_r155896038
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import java.io.File
+
+import org.scalatest.Suite
+
+import org.apache.spark.SparkContext
+import org.apache.spark.ml.Transformer
+import org.apache.spark.sql.{DataFrame, Encoder, Row}
+import org.apache.spark.sql.execution.streaming.MemoryStream
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.sql.test.TestSparkSession
+import org.apache.spark.util.Utils
+
+trait MLTest extends StreamTest with TempDirectory { self: Suite =>
+
+  @transient var sc: SparkContext = _
+  @transient var checkpointDir: String = _
+
+  protected override def createSparkSession: TestSparkSession = {
+new TestSparkSession(new SparkContext("local[2]", "MLlibUnitTest", 
sparkConf))
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+sc = spark.sparkContext
+checkpointDir = Utils.createDirectory(tempDir.getCanonicalPath, 
"checkpoints").toString
+sc.setCheckpointDir(checkpointDir)
+  }
+
+  override def afterAll() {
--- End diff --

Well, we'll find out in a few weeks  : )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite support S...

2017-12-08 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19843
  
Also, can you please remove "WIP" from the PR title and update the Testing 
part of the PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #84663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84663/testReport)**
 for PR 19884 at commit 
[`fdba406`](https://github.com/apache/spark/commit/fdba406f29216b8ef592de45dc36461217113410).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19884
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84663/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19884
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #84663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84663/testReport)**
 for PR 19884 at commit 
[`fdba406`](https://github.com/apache/spark/commit/fdba406f29216b8ef592de45dc36461217113410).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >