[GitHub] spark issue #20720: [SPARK-23570] [SQL] Add Spark 2.3.0 in HiveExternalCatal...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20720
  
**[Test build #87900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87900/testReport)**
 for PR 20720 at commit 
[`e99e425`](https://github.com/apache/spark/commit/e99e42516c8255d07990ff05f0bfadc4fa8c4244).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20720: [SPARK-23570] [SQL] Add Spark 2.3.0 in HiveExternalCatal...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20720
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20722: [SPARK-23571][K8S] Delete auxiliary Kubernetes re...

2018-03-02 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20722#discussion_r171975765
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -159,6 +159,19 @@ private[spark] class Client(
 logInfo(s"Waiting for application $appName to finish...")
 watcher.awaitCompletion()
 logInfo(s"Application $appName finished.")
+try {
--- End diff --

Per the discussion offline, I think the right solution is move resource 
management to the driver pod. This way, resource cleanup is guaranteed 
regardless of the deployment mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20721: [DO-NOT-MERGE] [TEST] Try different spark.sql.sources.de...

2018-03-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20721
  
@dongjoon-hyun I am just trying to see how many queries fail and see 
whether they are all included. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20718
  
**[Test build #87902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87902/testReport)**
 for PR 20718 at commit 
[`358a76a`](https://github.com/apache/spark/commit/358a76ae781016586a892aa077d88f0c27876d76).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20718
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87902/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87909/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20710
  
**[Test build #87909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87909/testReport)**
 for PR 20710 at commit 
[`cb6b2cf`](https://github.com/apache/spark/commit/cb6b2cfeab71abc443f82cc4b016bfc572d45050).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20698
  
**[Test build #87911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87911/testReport)**
 for PR 20698 at commit 
[`1e244e0`](https://github.com/apache/spark/commit/1e244e0d484d13d08b5afeed45bcbd9805254fe1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87908/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #87908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87908/testReport)**
 for PR 20632 at commit 
[`0183678`](https://github.com/apache/spark/commit/01836782c828db8ddd928726114d464e273b0a86).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20698
  
**[Test build #87913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87913/testReport)**
 for PR 20698 at commit 
[`602ab36`](https://github.com/apache/spark/commit/602ab36490a692080682867f98a8a5d8f7b2390d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20710
  
**[Test build #87912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87912/testReport)**
 for PR 20710 at commit 
[`544eb1b`](https://github.com/apache/spark/commit/544eb1b296bceb213965bf3c5dc1a6264c5b7acd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread misutoth
Github user misutoth commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171991788
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1500,31 +1534,35 @@ object functions {
   }
 
   /**
-   * Computes the cosine of the given value. Units in radians.
+   * @param e angle in radians
+   * @return cosine of the angle, as if computed by [[java.lang.Math#cos]]
--- End diff --

Replaced all of them as suggested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread misutoth
Github user misutoth commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171991647
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -512,7 +529,11 @@ case class Rint(child: Expression) extends 
UnaryMathExpression(math.rint, "ROUND
 case class Signum(child: Expression) extends 
UnaryMathExpression(math.signum, "SIGNUM")
 
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the sine of `expr`.",
+  usage = "_FUNC_(expr) - Returns the sine of `expr`, as if computed by 
`java.lang.Math._FUNC_`.",
--- End diff --

As far as the trigonometric functions concerned they should be now in sync 
I think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20618
  
**[Test build #87914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87914/testReport)**
 for PR 20618 at commit 
[`627e204`](https://github.com/apache/spark/commit/627e204ed03cfd6caa06e8f64dc605b62f4d2e5e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1246/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread misutoth
Github user misutoth commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171991691
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -173,16 +172,26 @@ def _():
 
 _functions_2_1 = {
 # unary math functions
-'degrees': 'Converts an angle measured in radians to an approximately 
equivalent angle ' +
-   'measured in degrees.',
-'radians': 'Converts an angle measured in degrees to an approximately 
equivalent angle ' +
-   'measured in radians.',
+'degrees': """Converts an angle measured in radians to an 
approximately equivalent angle
+   measured in degrees.
--- End diff --

added


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread misutoth
Github user misutoth commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171991733
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -173,16 +172,26 @@ def _():
 
 _functions_2_1 = {
 # unary math functions
-'degrees': 'Converts an angle measured in radians to an approximately 
equivalent angle ' +
-   'measured in degrees.',
-'radians': 'Converts an angle measured in degrees to an approximately 
equivalent angle ' +
-   'measured in radians.',
+'degrees': """Converts an angle measured in radians to an 
approximately equivalent angle
+   measured in degrees.
+   :param col: angle in radians
+   :return: angle in degrees, as if computed by 
`java.lang.Math.toDegrees()`""",
+'radians': """Converts an angle measured in degrees to an 
approximately equivalent angle measured in radians.
--- End diff --

Yes. I wrapped the text.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20710
  
**[Test build #87915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87915/testReport)**
 for PR 20710 at commit 
[`79495b1`](https://github.com/apache/spark/commit/79495b1f9e994f77ccf40c47eb2fb0baf5873f66).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20705
  
**[Test build #87916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87916/testReport)**
 for PR 20705 at commit 
[`3ec9309`](https://github.com/apache/spark/commit/3ec9309f923405873d73a6e5d9376bba08e050d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20686
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87904/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20686
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20722: [SPARK-23571][K8S] Delete auxiliary Kubernetes re...

2018-03-02 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20722#discussion_r171968886
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -159,6 +159,19 @@ private[spark] class Client(
 logInfo(s"Waiting for application $appName to finish...")
 watcher.awaitCompletion()
 logInfo(s"Application $appName finished.")
+try {
--- End diff --

(also because this code path isn't invoked in fire-and-forget mode IIUC)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20714
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20714
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1242/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87899/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1243/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #87906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87906/testReport)**
 for PR 20208 at commit 
[`6ae471c`](https://github.com/apache/spark/commit/6ae471c8ecaae3eb3888eecaac1c4e7552bedcc6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20722: [SPARK-23571][K8S] Delete auxiliary Kubernetes re...

2018-03-02 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20722#discussion_r171972187
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -159,6 +159,19 @@ private[spark] class Client(
 logInfo(s"Waiting for application $appName to finish...")
 watcher.awaitCompletion()
 logInfo(s"Application $appName finished.")
+try {
--- End diff --

I see! It was the RBAC rules and downscoping them that led us here. I'm 
concerned not all jobs will actually use this interactive mode of launching. 
What do you think of just granting more permissions to the driver and allowing 
cleanup there? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20720: [SPARK-23570] [SQL] Add Spark 2.3.0 in HiveExternalCatal...

2018-03-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20720
  
Thanks!

Also cc @cloud-fan 

Merged to master/2.3



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread asolimando
Github user asolimando commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r171983753
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -703,4 +707,16 @@ private object RandomForestSuite {
 val (indices, values) = map.toSeq.sortBy(_._1).unzip
 Vectors.sparse(size, indices.toArray, values.toArray)
   }
+
+  @tailrec
+  private def getSumLeafCounters(nodes: List[Node], acc: Long = 0): Long =
--- End diff --

Sorry, I have added them


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-02 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/20698#discussion_r171988510
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculatorSuite.scala
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.sources.v2.DataSourceOptions
+
+class KafkaOffsetRangeCalculatorSuite extends SparkFunSuite {
+
+  def testWithMinPartitions(name: String, minPartition: Int)
+  (f: KafkaOffsetRangeCalculator => Unit): Unit = {
+val options = new DataSourceOptions(Map("minPartitions" -> 
minPartition.toString).asJava)
+test(s"with minPartition = $minPartition: $name") {
+  f(KafkaOffsetRangeCalculator(options))
+}
+  }
+
+
+  test("with no minPartition: N TopicPartitions to N offset ranges") {
+val calc = KafkaOffsetRangeCalculator(DataSourceOptions.empty())
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1),
+untilOffsets = Map(tp1 -> 2)) ==
+  Seq(KafkaOffsetRange(tp1, 1, 2, None)))
+
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1),
+untilOffsets = Map(tp1 -> 2, tp2 -> 1), Seq.empty) ==
+  Seq(KafkaOffsetRange(tp1, 1, 2, None)))
+
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1, tp2 -> 1),
+untilOffsets = Map(tp1 -> 2)) ==
+  Seq(KafkaOffsetRange(tp1, 1, 2, None)))
+
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1, tp2 -> 1),
+untilOffsets = Map(tp1 -> 2),
+executorLocations = Seq("location")) ==
+  Seq(KafkaOffsetRange(tp1, 1, 2, Some("location"
+  }
+
+  test("with no minPartition: empty ranges ignored") {
+val calc = KafkaOffsetRangeCalculator(DataSourceOptions.empty())
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1, tp2 -> 1),
+untilOffsets = Map(tp1 -> 2, tp2 -> 1)) ==
+  Seq(KafkaOffsetRange(tp1, 1, 2, None)))
+  }
+
+  testWithMinPartitions("N TopicPartitions to N offset ranges", 3) { calc 
=>
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1, tp2 -> 1, tp3 -> 1),
+untilOffsets = Map(tp1 -> 2, tp2 -> 2, tp3 -> 2)) ==
+  Seq(
+KafkaOffsetRange(tp1, 1, 2, None),
+KafkaOffsetRange(tp2, 1, 2, None),
+KafkaOffsetRange(tp3, 1, 2, None)))
+  }
+
+  testWithMinPartitions("1 TopicPartition to N offset ranges", 4) { calc =>
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1),
+untilOffsets = Map(tp1 -> 5)) ==
+  Seq(
+KafkaOffsetRange(tp1, 1, 2, None),
+KafkaOffsetRange(tp1, 2, 3, None),
+KafkaOffsetRange(tp1, 3, 4, None),
+KafkaOffsetRange(tp1, 4, 5, None)))
+
+assert(
+  calc.getRanges(
+fromOffsets = Map(tp1 -> 1),
+untilOffsets = Map(tp1 -> 5),
+executorLocations = Seq("location")) ==
+Seq(
+  KafkaOffsetRange(tp1, 1, 2, None),
+  KafkaOffsetRange(tp1, 2, 3, None),
+  KafkaOffsetRange(tp1, 3, 4, None),
+  KafkaOffsetRange(tp1, 4, 5, None))) // location pref not set 
when minPartition is set
+  }
+
+  testWithMinPartitions("N skewed TopicPartitions to M offset ranges", 3) 
{ calc =>
--- End diff --

can you also add a test:
```
fromOffsets = Map(tp1 -> 1),
untilOffsets = Map(tp1 -> 10)
minPartitions = 3
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-02 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/20698#discussion_r17198
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.sql.sources.v2.DataSourceOptions
+
+
+/**
+ * Class to calculate offset ranges to process based on the the from and 
until offsets, and
+ * the configured `minPartitions`.
+ */
+private[kafka010] class KafkaOffsetRangeCalculator(val minPartitions: 
Option[Int]) {
+  require(minPartitions.isEmpty || minPartitions.get > 0)
+
+  import KafkaOffsetRangeCalculator._
+  /**
+   * Calculate the offset ranges that we are going to process this batch. 
If `minPartitions`
+   * is not set or is set less than or equal the number of 
`topicPartitions` that we're going to
+   * consume, then we fall back to a 1-1 mapping of Spark tasks to Kafka 
partitions. If
+   * `numPartitions` is set higher than the number of our 
`topicPartitions`, then we will split up
+   * the read tasks of the skewed partitions to multiple Spark tasks.
+   * The number of Spark tasks will be *approximately* `numPartitions`. It 
can be less or more
+   * depending on rounding errors or Kafka partitions that didn't receive 
any new data.
+   */
+  def getRanges(
+  fromOffsets: PartitionOffsetMap,
+  untilOffsets: PartitionOffsetMap,
+  executorLocations: Seq[String] = Seq.empty): Seq[KafkaOffsetRange] = 
{
+val partitionsToRead = 
untilOffsets.keySet.intersect(fromOffsets.keySet)
+
+val offsetRanges = partitionsToRead.toSeq.map { tp =>
+  KafkaOffsetRange(tp, fromOffsets(tp), untilOffsets(tp), preferredLoc 
= None)
+}.filter(_.size > 0)
+
+// If minPartitions not set or there are enough partitions to satisfy 
minPartitions
+if (minPartitions.isEmpty || offsetRanges.size > minPartitions.get) {
+  // Assign preferred executor locations to each range such that the 
same topic-partition is
+  // preferentially read from the same executor and the KafkaConsumer 
can be reused.
+  offsetRanges.map { range =>
+range.copy(preferredLoc = getLocation(range.topicPartition, 
executorLocations))
+  }
+} else {
+
+  // Splits offset ranges with relatively large amount of data to 
smaller ones.
+  val totalSize = offsetRanges.map(o => o.untilOffset - 
o.fromOffset).sum
--- End diff --

nit: `map(_.size).sum`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-02 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/20698#discussion_r171987888
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.sql.sources.v2.DataSourceOptions
+
+
+/**
+ * Class to calculate offset ranges to process based on the the from and 
until offsets, and
+ * the configured `minPartitions`.
+ */
+private[kafka010] class KafkaOffsetRangeCalculator(val minPartitions: 
Option[Int]) {
+  require(minPartitions.isEmpty || minPartitions.get > 0)
+
+  import KafkaOffsetRangeCalculator._
+  /**
+   * Calculate the offset ranges that we are going to process this batch. 
If `minPartitions`
+   * is not set or is set less than or equal the number of 
`topicPartitions` that we're going to
+   * consume, then we fall back to a 1-1 mapping of Spark tasks to Kafka 
partitions. If
+   * `numPartitions` is set higher than the number of our 
`topicPartitions`, then we will split up
+   * the read tasks of the skewed partitions to multiple Spark tasks.
+   * The number of Spark tasks will be *approximately* `numPartitions`. It 
can be less or more
+   * depending on rounding errors or Kafka partitions that didn't receive 
any new data.
+   */
+  def getRanges(
+  fromOffsets: PartitionOffsetMap,
+  untilOffsets: PartitionOffsetMap,
+  executorLocations: Seq[String] = Seq.empty): Seq[KafkaOffsetRange] = 
{
+val partitionsToRead = 
untilOffsets.keySet.intersect(fromOffsets.keySet)
+
+val offsetRanges = partitionsToRead.toSeq.map { tp =>
+  KafkaOffsetRange(tp, fromOffsets(tp), untilOffsets(tp), preferredLoc 
= None)
+}.filter(_.size > 0)
+
+// If minPartitions not set or there are enough partitions to satisfy 
minPartitions
+if (minPartitions.isEmpty || offsetRanges.size > minPartitions.get) {
+  // Assign preferred executor locations to each range such that the 
same topic-partition is
+  // preferentially read from the same executor and the KafkaConsumer 
can be reused.
+  offsetRanges.map { range =>
+range.copy(preferredLoc = getLocation(range.topicPartition, 
executorLocations))
+  }
+} else {
+
+  // Splits offset ranges with relatively large amount of data to 
smaller ones.
+  val totalSize = offsetRanges.map(o => o.untilOffset - 
o.fromOffset).sum
+  val idealRangeSize = totalSize.toDouble / minPartitions.get
+
+  offsetRanges.flatMap { range =>
+// Split the current range into subranges as close to the ideal 
range size
+val rangeSize = range.untilOffset - range.fromOffset
+val numSplitsInRange = math.round(rangeSize.toDouble / 
idealRangeSize).toInt
--- End diff --

nit: `range.size`, you may remove `rangeSize` above


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993066
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
 ---
@@ -48,6 +48,9 @@
*  same task id but different attempt number, which 
means there are multiple
*  tasks with the same task id running at the same 
time. Implementations can
*  use this attempt number to distinguish writers 
of different task attempts.
+   * @param epochId A monotonically increasing id for streaming queries 
that are split in to
+   *discrete periods of execution. For queries that 
execute as a single batch, this
+   *id will always be zero.
*/
-  DataWriter createDataWriter(int partitionId, int attemptNumber);
+  DataWriter createDataWriter(int partitionId, int attemptNumber, long 
epochId);
--- End diff --

Add clear lifecycle semantics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993101
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
 ---
@@ -198,7 +201,7 @@ object DataWritingSparkTask extends Logging {
   })(catchBlock = {
 // If there is an error, abort this writer
 logError(s"Writer for partition ${context.partitionId()} is 
aborting.")
-dataWriter.abort()
+if (dataWriter != null) dataWriter.abort()
 logError(s"Writer for partition ${context.partitionId()} aborted.")
--- End diff --

nit: add comment that the exception will be rethrown.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87913/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20710
  
**[Test build #87918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87918/testReport)**
 for PR 20710 at commit 
[`215c225`](https://github.com/apache/spark/commit/215c225c5a1623cfa02f617201e21067bbf6088a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20722: [SPARK-23571][K8S] Delete auxiliary Kubernetes re...

2018-03-02 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20722#discussion_r171968410
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -159,6 +159,19 @@ private[spark] class Client(
 logInfo(s"Waiting for application $appName to finish...")
 watcher.awaitCompletion()
 logInfo(s"Application $appName finished.")
+try {
--- End diff --

Why are we doing this in client code? Driver shutdown is the right place to 
perform cleanup right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1244/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20723
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20632#discussion_r171982559
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -703,4 +707,16 @@ private object RandomForestSuite {
 val (indices, values) = map.toSeq.sortBy(_._1).unzip
 Vectors.sparse(size, indices.toArray, values.toArray)
   }
+
+  @tailrec
+  private def getSumLeafCounters(nodes: List[Node], acc: Long = 0): Long =
--- End diff --

Need to enclose the function body in curly braces


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87911/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87910/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993276
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
 ---
@@ -172,17 +173,19 @@ object DataWritingSparkTask extends Logging {
   writeTask: DataWriterFactory[InternalRow],
   context: TaskContext,
   iter: Iterator[InternalRow]): WriterCommitMessage = {
-val dataWriter = writeTask.createDataWriter(context.partitionId(), 
context.attemptNumber())
 val epochCoordinator = EpochCoordinatorRef.get(
   
context.getLocalProperty(ContinuousExecution.EPOCH_COORDINATOR_ID_KEY),
   SparkEnv.get)
 val currentMsg: WriterCommitMessage = null
 var currentEpoch = 
context.getLocalProperty(ContinuousExecution.START_EPOCH_KEY).toLong
 
 do {
+  var dataWriter: DataWriter[InternalRow] = null
   // write the data and commit this writer.
   Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
 try {
+  dataWriter = writeTask.createDataWriter(
+context.partitionId(), context.attemptNumber(), currentEpoch)
   iter.foreach(dataWriter.write)
--- End diff --

fix this! dont use foreach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993467
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java 
---
@@ -36,8 +36,9 @@
  * {@link DataSourceWriter#commit(WriterCommitMessage[])} with commit 
messages from other data
  * writers. If this data writer fails(one record fails to write or {@link 
#commit()} fails), an
  * exception will be sent to the driver side, and Spark will retry this 
writing task for some times,
--- End diff --

Spark may retry... (in continuous we dont retry the task)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20721: [DO-NOT-MERGE] [TEST] Try different spark.sql.sources.de...

2018-03-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20721
  
Hi, @gatorsmile .
Actually, this one is tested via #20705 .
Do you want to get a result on `json`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20714
  
**[Test build #87905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87905/testReport)**
 for PR 20714 at commit 
[`d15eba7`](https://github.com/apache/spark/commit/d15eba754a59721bc7d9cdc7d374f2f323d21e41).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20705
  
Could you change the default value to json? Can all the tests pass?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20723: [SPARK-23538][core] Remove custom configuration f...

2018-03-02 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/20723

[SPARK-23538][core] Remove custom configuration for SSL client.

These options were used to configure the built-in JRE SSL libraries
when downloading files from HTTPS servers. But because they were also
used to set up the now (long) removed internal HTTPS file server,
their default configuration chose convenience over security by having
overly lenient settings.

This change removes the configuration options that affect the JRE SSL
libraries. The JRE trust store can still be configured via system
properties (or globally in the JRE security config). The only lost
functionality is not being able to disable the default hostname
verifier when using spark-submit, which should be fine since Spark
itself is not using https for any internal functionality anymore.

I also removed the HTTP-related code from the REPL class loader, since
we haven't had a HTTP server for REPL-generated classes for a while.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-23538

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20723


commit c83611eca573f3f460790f4fde7bea7ef7887839
Author: Marcelo Vanzin 
Date:   2018-03-02T21:43:48Z

[SPARK-23538][core] Remove custom configuration for SSL client.

These options were used to configure the built-in JRE SSL libraries
when downloading files from HTTPS servers. But because they were also
used to set up the now (long) removed internal HTTPS file server,
their default configuration chose convenience over security by having
overly lenient settings.

This change removes the configuration options that affect the JRE SSL
libraries. The JRE trust store can still be configured via system
properties (or globally in the JRE security config). The only lost
functionality is not being able to disable the default hostname
verifier when using spark-submit, which should be fine since Spark
itself is not using https for any internal functionality anymore.

I also removed the HTTP-related code from the REPL class loader, since
we haven't had a HTTP server for REPL-generated classes for a while.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20721: [DO-NOT-MERGE] [TEST] Try different spark.sql.sou...

2018-03-02 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/20721


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20723
  
**[Test build #87907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87907/testReport)**
 for PR 20723 at commit 
[`c83611e`](https://github.com/apache/spark/commit/c83611eca573f3f460790f4fde7bea7ef7887839).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20720: [SPARK-23570] [SQL] Add Spark 2.3.0 in HiveExtern...

2018-03-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20720


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #87908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87908/testReport)**
 for PR 20632 at commit 
[`0183678`](https://github.com/apache/spark/commit/01836782c828db8ddd928726114d464e273b0a86).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread asolimando
Github user asolimando commented on the issue:

https://github.com/apache/spark/pull/20632
  
Thanks for the suggestion @sethah, I have updated the PR with the extra 
check (both tests).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #87910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87910/testReport)**
 for PR 20632 at commit 
[`3dfe86c`](https://github.com/apache/spark/commit/3dfe86c8376c705137d3b2ae57565a3c30d4a96d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1245/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20710
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20718
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20698
  
**[Test build #87911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87911/testReport)**
 for PR 20698 at commit 
[`1e244e0`](https://github.com/apache/spark/commit/1e244e0d484d13d08b5afeed45bcbd9805254fe1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20698#discussion_r171989658
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.sql.sources.v2.DataSourceOptions
+
+
+/**
+ * Class to calculate offset ranges to process based on the the from and 
until offsets, and
+ * the configured `minPartitions`.
+ */
+private[kafka010] class KafkaOffsetRangeCalculator(val minPartitions: 
Option[Int]) {
+  require(minPartitions.isEmpty || minPartitions.get > 0)
+
+  import KafkaOffsetRangeCalculator._
+  /**
+   * Calculate the offset ranges that we are going to process this batch. 
If `minPartitions`
+   * is not set or is set less than or equal the number of 
`topicPartitions` that we're going to
+   * consume, then we fall back to a 1-1 mapping of Spark tasks to Kafka 
partitions. If
+   * `numPartitions` is set higher than the number of our 
`topicPartitions`, then we will split up
+   * the read tasks of the skewed partitions to multiple Spark tasks.
+   * The number of Spark tasks will be *approximately* `numPartitions`. It 
can be less or more
+   * depending on rounding errors or Kafka partitions that didn't receive 
any new data.
+   */
+  def getRanges(
+  fromOffsets: PartitionOffsetMap,
+  untilOffsets: PartitionOffsetMap,
+  executorLocations: Seq[String] = Seq.empty): Seq[KafkaOffsetRange] = 
{
+val partitionsToRead = 
untilOffsets.keySet.intersect(fromOffsets.keySet)
+
+val offsetRanges = partitionsToRead.toSeq.map { tp =>
+  KafkaOffsetRange(tp, fromOffsets(tp), untilOffsets(tp), preferredLoc 
= None)
+}.filter(_.size > 0)
+
+// If minPartitions not set or there are enough partitions to satisfy 
minPartitions
+if (minPartitions.isEmpty || offsetRanges.size > minPartitions.get) {
+  // Assign preferred executor locations to each range such that the 
same topic-partition is
+  // preferentially read from the same executor and the KafkaConsumer 
can be reused.
+  offsetRanges.map { range =>
+range.copy(preferredLoc = getLocation(range.topicPartition, 
executorLocations))
+  }
+} else {
+
+  // Splits offset ranges with relatively large amount of data to 
smaller ones.
+  val totalSize = offsetRanges.map(o => o.untilOffset - 
o.fromOffset).sum
+  val idealRangeSize = totalSize.toDouble / minPartitions.get
+
+  offsetRanges.flatMap { range =>
+// Split the current range into subranges as close to the ideal 
range size
+val rangeSize = range.untilOffset - range.fromOffset
+val numSplitsInRange = math.round(rangeSize.toDouble / 
idealRangeSize).toInt
+
+(0 until numSplitsInRange).map { i =>
+  val splitStart = range.fromOffset + rangeSize * (i.toDouble / 
numSplitsInRange)
+  val splitEnd = range.fromOffset + rangeSize * ((i.toDouble + 1) 
/ numSplitsInRange)
+  KafkaOffsetRange(
+range.topicPartition, splitStart.toLong, splitEnd.toLong, 
preferredLoc = None)
+}
+  }
+}
+  }
+
+  private def getLocation(tp: TopicPartition, executorLocations: 
Seq[String]): Option[String] = {
+def floorMod(a: Long, b: Int): Int = ((a % b).toInt + b) % b
+
+val numExecutors = executorLocations.length
+if (numExecutors > 0) {
+  // This allows cached KafkaConsumers in the executors to be re-used 
to read the same
+  // partition in every batch.
+  Some(executorLocations(floorMod(tp.hashCode, numExecutors)))
+} else None
+  }
+}
+
+private[kafka010] object KafkaOffsetRangeCalculator {
+
+  def apply(options: 

[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20705
  
Sure, @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993622
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
 ---
@@ -48,6 +48,9 @@
*  same task id but different attempt number, which 
means there are multiple
*  tasks with the same task id running at the same 
time. Implementations can
*  use this attempt number to distinguish writers 
of different task attempts.
+   * @param epochId A monotonically increasing id for streaming queries 
that are split in to
+   *discrete periods of execution. For queries that 
execute as a single batch, this
--- End diff --

For non-streaming queries, this...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993519
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java 
---
@@ -36,8 +36,9 @@
  * {@link DataSourceWriter#commit(WriterCommitMessage[])} with commit 
messages from other data
  * writers. If this data writer fails(one record fails to write or {@link 
#commit()} fails), an
  * exception will be sent to the driver side, and Spark will retry this 
writing task for some times,
--- End diff --

for some times --> for a few times


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20710#discussion_r171993559
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java 
---
@@ -36,8 +36,9 @@
  * {@link DataSourceWriter#commit(WriterCommitMessage[])} with commit 
messages from other data
  * writers. If this data writer fails(one record fails to write or {@link 
#commit()} fails), an
  * exception will be sent to the driver side, and Spark will retry this 
writing task for some times,
--- End diff --

Break this sentence. very long.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20698
  
**[Test build #87913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87913/testReport)**
 for PR 20698 at commit 
[`602ab36`](https://github.com/apache/spark/commit/602ab36490a692080682867f98a8a5d8f7b2390d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20724: [SPARK-18630][PYTHON][ML] Move del method from JavaParam...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20724
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20725
  
**[Test build #87919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87919/testReport)**
 for PR 20725 at commit 
[`afcb0d5`](https://github.com/apache/spark/commit/afcb0d5608c17d2fc004a0b6d4af4573abca4e4b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87919/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #87906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87906/testReport)**
 for PR 20208 at commit 
[`6ae471c`](https://github.com/apache/spark/commit/6ae471c8ecaae3eb3888eecaac1c4e7552bedcc6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20718: [SPARK-23514][FOLLOW-UP] Remove more places using sparkC...

2018-03-02 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20718
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20723
  
**[Test build #87907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87907/testReport)**
 for PR 20723 at commit 
[`c83611e`](https://github.com/apache/spark/commit/c83611eca573f3f460790f4fde7bea7ef7887839).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20618
  
**[Test build #87914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87914/testReport)**
 for PR 20618 at commit 
[`627e204`](https://github.com/apache/spark/commit/627e204ed03cfd6caa06e8f64dc605b62f4d2e5e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20618
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20618
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87914/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20710
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87917/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20710
  
**[Test build #87917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87917/testReport)**
 for PR 20710 at commit 
[`9fb74e2`](https://github.com/apache/spark/commit/9fb74e2ccbe668ac6a1f2d2240b67a04400ba78b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1248/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20724: [SPARK-18630][PYTHON][ML] Move del method from JavaParam...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20724
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20725
  
**[Test build #87919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87919/testReport)**
 for PR 20725 at commit 
[`afcb0d5`](https://github.com/apache/spark/commit/afcb0d5608c17d2fc004a0b6d4af4573abca4e4b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support...

2018-03-02 Thread BryanCutler
GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/20725

[WIP][SPARK-23555][PYTHON] Add BinaryType support for Arrow

## What changes were proposed in this pull request?

Adding `BinaryType` support for Arrow in pyspark.

## How was this patch tested?

Additional unit tests in pyspark for code paths that use Arrow


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark 
arrow-binary-type-support-SPARK-23555

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20725.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20725


commit afcb0d5608c17d2fc004a0b6d4af4573abca4e4b
Author: Bryan Cutler 
Date:   2018-03-03T00:32:45Z

added support for binary type, not currently working due to arrow error




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [WIP][SPARK-23555][PYTHON] Add BinaryType support for Ar...

2018-03-02 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20725
  
This is a WIP as some issues need to be worked out on the Arrow side and 
need tests for pandas_udfs.  Currently get the following error when converting 
from pandas:
```
File "/home/bryan/git/spark/python/pyspark/serializers.py", line 237, in 
create_array
return pa.Array.from_pandas(s, mask=mask, type=t)
  File "array.pxi", line 335, in pyarrow.lib.Array.from_pandas
  File "array.pxi", line 170, in pyarrow.lib.array
  File "array.pxi", line 70, in pyarrow.lib._ndarray_to_array
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: No cast implemented from binary to binary
```
The corresponding JIRA is https://issues.apache.org/jira/browse/ARROW-2141


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20724: [SPARK-18630][PYTHON][ML] Move del method from JavaParam...

2018-03-02 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20724
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20698
  
Thank you. Merging to master only as this is a new feature touching 
production code paths.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87907/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20458: changed scala example from java "style" to scala

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20458
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20723: [SPARK-23538][core] Remove custom configuration for SSL ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20723
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >