[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3675/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91265/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91265/testReport)**
 for PR 20697 at commit 
[`5a8fd7f`](https://github.com/apache/spark/commit/5a8fd7ff400bcf44e94800905bfa48715eaba88c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91265/testReport)**
 for PR 20697 at commit 
[`5a8fd7f`](https://github.com/apache/spark/commit/5a8fd7ff400bcf44e94800905bfa48715eaba88c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191486261
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/resources/log4j.properties
 ---
@@ -0,0 +1,31 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set everything to be logged to the file target/integration-tests.log
+log4j.rootCategory=INFO, file
--- End diff --

It's still necessary, it does not inherit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91252/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91253/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91252 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91252/testReport)**
 for PR 13599 at commit 
[`44500fc`](https://github.com/apache/spark/commit/44500fc0d66bd930cc12ba6b66985e08f61d9ecc).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`
  * `  class DriverEndpoint(override val rpcEnv: RpcEnv)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91253/testReport)**
 for PR 21449 at commit 
[`92cb513`](https://github.com/apache/spark/commit/92cb513416c5dd0e9fa690c25cfae0565471a5e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191485369
  
--- Diff: resource-managers/kubernetes/integration-tests/pom.xml ---
@@ -0,0 +1,230 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.4.0-SNAPSHOT
+../../../pom.xml
+  
+
+  spark-kubernetes-integration-tests_2.11
+  spark-kubernetes-integration-tests
+  
+3.3.9
+3.5
+1.1.1
+5.0.2
+1.3.0
+1.4.0
+
+18.0
+1.3.9
+3.0.0
+1.2.17
+2.11.8
+2.11
+3.2.2
+2.2.6
+1.0
+1.7.24
+kubernetes-integration-tests
+
${project.build.directory}/spark-dist-unpacked
+N/A
+
${project.build.directory}/imageTag.txt
+
minikube
+
docker.io/kubespark
+
+  
+  jar
+  Spark Project Kubernetes Integration Tests
+
+  
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+
+
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+  test-jar
+  test
+
+
+
+  commons-logging
+  commons-logging
+  ${commons-logging.version}
+
+
+  com.google.code.findbugs
+  jsr305
+  ${jsr305.version}
+
+
+  com.google.guava
+  guava
+  test
+  
+  ${guava.version}
+
+
+  com.spotify
+  docker-client
+  ${docker-client.version}
+  test
+
+
+  io.fabric8
+  kubernetes-client
+  ${kubernetes-client.version}
+
+
+  log4j
+  log4j
+  ${log4j.version}
+
+
+  org.apache.commons
+  commons-lang3
+  ${commons-lang3.version}
+
+
+  org.scala-lang
+  scala-library
+  ${scala.version}
+
+
+  org.scalatest
+  scalatest_${scala.binary.version}
+  test
+
+
+  org.slf4j
+  slf4j-log4j12
+  ${slf4j-log4j12.version}
+  test
+
+  
+
+  
+
+  
+net.alchim31.maven
--- End diff --

Resolved in 5a8fd7ff40


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91264 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91264/testReport)**
 for PR 20697 at commit 
[`dd28032`](https://github.com/apache/spark/commit/dd280327ea0cec6d1643007679d189529c4bc1db).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91264/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3674/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91264 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91264/testReport)**
 for PR 20697 at commit 
[`dd28032`](https://github.com/apache/spark/commit/dd280327ea0cec6d1643007679d189529c4bc1db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191483976
  
--- Diff: resource-managers/kubernetes/integration-tests/pom.xml ---
@@ -0,0 +1,230 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.4.0-SNAPSHOT
+../../../pom.xml
+  
+
+  spark-kubernetes-integration-tests_2.11
+  spark-kubernetes-integration-tests
+  
+3.3.9
+3.5
+1.1.1
+5.0.2
+1.3.0
+1.4.0
+
+18.0
+1.3.9
+3.0.0
+1.2.17
+2.11.8
+2.11
+3.2.2
+2.2.6
+1.0
+1.7.24
+kubernetes-integration-tests
+
${project.build.directory}/spark-dist-unpacked
+N/A
+
${project.build.directory}/imageTag.txt
+
minikube
+
docker.io/kubespark
+
+  
+  jar
+  Spark Project Kubernetes Integration Tests
+
+  
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+
+
+
+  org.apache.spark
+  spark-core_${scala.binary.version}
+  ${project.version}
+  test-jar
+  test
+
+
+
+  commons-logging
+  commons-logging
+  ${commons-logging.version}
+
+
+  com.google.code.findbugs
+  jsr305
+  ${jsr305.version}
+
+
+  com.google.guava
+  guava
+  test
+  
+  ${guava.version}
+
+
+  com.spotify
+  docker-client
+  ${docker-client.version}
+  test
+
+
+  io.fabric8
+  kubernetes-client
+  ${kubernetes-client.version}
+
+
+  log4j
+  log4j
+  ${log4j.version}
--- End diff --

Resolved in 901edb3ba3, f68cdba77b, dd280327ea


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191483921
  
--- Diff: resource-managers/kubernetes/integration-tests/pom.xml ---
@@ -0,0 +1,230 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.4.0-SNAPSHOT
+../../../pom.xml
+  
+
+  spark-kubernetes-integration-tests_2.11
+  spark-kubernetes-integration-tests
+  
+3.3.9
+3.5
--- End diff --

Resolved in 901edb3ba3, f68cdba77b, dd280327ea


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191483040
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# for pandas UDFs the worker needs to know if the function takes
+# one or two arguments, but the signature is lost when wrapping 
with
+# fail_on_stopiteration, so we store it here
+if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
+func._argspec = _get_argspec(self.func)
--- End diff --

I see. I remember we still support pandas udf with Python 2? Does the 
resolution here not work with Pandas UDF and Python 2?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91263/testReport)**
 for PR 20697 at commit 
[`f68cdba`](https://github.com/apache/spark/commit/f68cdba77b323bb829ee69a16b1d3688b45b3129).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3673/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191481925
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# for pandas UDFs the worker needs to know if the function takes
+# one or two arguments, but the signature is lost when wrapping 
with
+# fail_on_stopiteration, so we store it here
+if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
+func._argspec = _get_argspec(self.func)
--- End diff --

That seems to be difficult in particular in Python 2. I'm not aware of a 
clean and straight forward way without a hack to copy the signature in Python 2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191481196
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# for pandas UDFs the worker needs to know if the function takes
+# one or two arguments, but the signature is lost when wrapping 
with
+# fail_on_stopiteration, so we store it here
+if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
+func._argspec = _get_argspec(self.func)
--- End diff --

I see. I saw @HyukjinKwon comment here: 
https://github.com/apache/spark/pull/21383#discussion_r191441634



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21450
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91261/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91261/testReport)**
 for PR 20697 at commit 
[`901edb3`](https://github.com/apache/spark/commit/901edb3ba3a566e5c6737d15e197950abce5131c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3672/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit exec...

2018-05-29 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request:

https://github.com/apache/spark/pull/21450

[SPARK-24319][SPARK SUBMIT] Fix spark-submit execution where no main class 
is required.

## What changes were proposed in this pull request?

With [PR 20925](https://github.com/apache/spark/pull/20925) now it's not 
possible to execute the following commands:
* run-example --help
* run-example --version
* run-example --usage-error
* run-example --status ...
* run-example --kill ...

In this PR the execution will be allowed for the mentioned commands.

## How was this patch tested?

Existing unit tests extended + additional written.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaborgsomogyi/spark SPARK-24319

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21450.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21450


commit a69850b6fdcbe2e234e70a597d9ad6beae6a6937
Author: Gabor Somogyi 
Date:   2018-05-29T15:38:56Z

[SPARK-24319][SPARK SUBMIT] Fix spark-submit execution where no main class 
is required.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91261/testReport)**
 for PR 20697 at commit 
[`901edb3`](https://github.com/apache/spark/commit/901edb3ba3a566e5c6737d15e197950abce5131c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #4190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4190/testReport)**
 for PR 21346 at commit 
[`7bd1b43`](https://github.com/apache/spark/commit/7bd1b43c81a3cdd7b88cf64994cfe8f2b3c5fdf8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21383
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21383
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21383
  
**[Test build #91257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91257/testReport)**
 for PR 21383 at commit 
[`8fac2a8`](https://github.com/apache/spark/commit/8fac2a80deb79030dee161e0d86b7b090bc892a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21440
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3671/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21440
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-05-29 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21440
  
thanks for the reviews @markhamstra @Ngone51 , I've updated the pr


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...

2018-05-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21440#discussion_r191476566
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -659,6 +659,11 @@ private[spark] class BlockManager(
* Get block from remote block managers as serialized bytes.
*/
   def getRemoteBytes(blockId: BlockId): Option[ChunkedByteBuffer] = {
+// TODO if we change this method to return the ManagedBuffer, then 
getRemoteValues
+// could just use the inputStream on the temp file, rather than 
memory-mapping the file.
+// Until then, replication can cause the process to use too much 
memory and get killed
+// by the OS / cluster manager (not a java OOM, since its a 
memory-mapped file) even though
+// we've read the data to disk.
--- End diff --

btw this fix is such low-hanging fruit that I would try to do this 
immediately afterwards.  (I haven't filed a jira yet just because there are 
already so many defunct jira related to this, I was going to wait till my 
changes got some traction).

I think its OK to get it in like this first, as this makes the behavior for 
2.01 gb basically the same as it was for 1.99 gb.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91260/testReport)**
 for PR 20697 at commit 
[`cfb8ee9`](https://github.com/apache/spark/commit/cfb8ee94e11b4871f9b8c7db4774bdb6cb42c40e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91260/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21440
  
**[Test build #91259 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91259/testReport)**
 for PR 21440 at commit 
[`a9cfe29`](https://github.com/apache/spark/commit/a9cfe294b15b2c9675d074645865fa403285d1d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91260/testReport)**
 for PR 20697 at commit 
[`cfb8ee9`](https://github.com/apache/spark/commit/cfb8ee94e11b4871f9b8c7db4774bdb6cb42c40e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21442
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3670/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21442
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191474516
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Logging.scala
 ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest
+
+import org.apache.log4j.{Logger, LogManager, Priority}
+
+trait Logging {
--- End diff --

Resolved in cfb8ee94e11b4871f9b8c7db4774bdb6cb42c40e


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191473968
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -900,6 +900,22 @@ def __call__(self, x):
 self.assertEqual(f, f_.func)
 self.assertEqual(return_type, f_.returnType)
 
+def test_stopiteration_in_udf(self):
--- End diff --

Yes please, I am not really familiar with UDFs in general


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...

2018-05-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21440#discussion_r191472949
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferFileRegionSuite.scala 
---
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io
+
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import scala.util.Random
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mockito.MockitoSugar
+
+import org.apache.spark.{SparkConf, SparkEnv, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.util.io.ChunkedByteBuffer
+
+class ChunkedByteBufferFileRegionSuite extends SparkFunSuite with 
MockitoSugar
+with BeforeAndAfterEach {
+
+  override protected def beforeEach(): Unit = {
+super.beforeEach()
+val conf = new SparkConf()
+val env = mock[SparkEnv]
+SparkEnv.set(env)
+when(env.conf).thenReturn(conf)
+  }
+
+  override protected def afterEach(): Unit = {
+SparkEnv.set(null)
+  }
+
+  private def generateChunkByteBuffer(nChunks: Int, perChunk: Int): 
ChunkedByteBuffer = {
+val bytes = (0 until nChunks).map { chunkIdx =>
+  val bb = ByteBuffer.allocate(perChunk)
+  (0 until perChunk).foreach { idx =>
+bb.put((chunkIdx * perChunk + idx).toByte)
+  }
+  bb.position(0)
+  bb
+}.toArray
+new ChunkedByteBuffer(bytes)
+  }
+
+  test("transferTo can stop and resume correctly") {
+SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, 9L)
+val cbb = generateChunkByteBuffer(4, 10)
+val fileRegion = cbb.toNetty
+
+val targetChannel = new LimitedWritableByteChannel(40)
+
+var pos = 0L
+// write the fileregion to the channel, but with the transfer limited 
at various spots along
+// the way.
+
+// limit to within the first chunk
+targetChannel.acceptNBytes = 5
+pos = fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 5)
+
+// a little bit further within the first chunk
+targetChannel.acceptNBytes = 2
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 7)
+
+// past the first chunk, into the 2nd
+targetChannel.acceptNBytes = 6
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 13)
+
+// right to the end of the 2nd chunk
+targetChannel.acceptNBytes = 7
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 20)
+
+// rest of 2nd chunk, all of 3rd, some of 4th
+targetChannel.acceptNBytes = 15
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 35)
+
+// now till the end
+targetChannel.acceptNBytes = 5
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 40)
+
+// calling again at the end should be OK
+targetChannel.acceptNBytes = 20
+fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 40)
+  }
+
+  test(s"transfer to with random limits") {
+val rng = new Random()
+val seed = System.currentTimeMillis()
+logInfo(s"seed = $seed")
+rng.setSeed(seed)
+val chunkSize = 1e4.toInt
+SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, 
rng.nextInt(chunkSize).toLong)
+
+val cbb = generateChunkByteBuffer(50, chunkSize)
+val fileRegion = cbb.toNetty
+val transferLimit = 1e5.toInt
+val targetChannel = new LimitedWritableByteChannel(transferLimit)
+while (targetChannel.pos < cbb.size) {
+  val nextTransferSize = rng.nextInt(transferLimit)
+  targetChannel.acceptNBytes = nextTransferSize
+  fileR

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191472884
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# for pandas UDFs the worker needs to know if the function takes
+# one or two arguments, but the signature is lost when wrapping 
with
+# fail_on_stopiteration, so we store it here
+if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
+func._argspec = _get_argspec(self.func)
--- End diff --

I am not sure how to do that, though, can you suggest a way? Originally 
this hack was in `fail_one_stopiteration`, but then it was decided to restrict 
its scope as much as possible


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...

2018-05-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21440#discussion_r191472540
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferFileRegionSuite.scala 
---
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io
+
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import scala.util.Random
+
+import org.mockito.Mockito.when
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.mockito.MockitoSugar
+
+import org.apache.spark.{SparkConf, SparkEnv, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.util.io.ChunkedByteBuffer
+
+class ChunkedByteBufferFileRegionSuite extends SparkFunSuite with 
MockitoSugar
+with BeforeAndAfterEach {
+
+  override protected def beforeEach(): Unit = {
+super.beforeEach()
+val conf = new SparkConf()
+val env = mock[SparkEnv]
+SparkEnv.set(env)
+when(env.conf).thenReturn(conf)
+  }
+
+  override protected def afterEach(): Unit = {
+SparkEnv.set(null)
+  }
+
+  private def generateChunkByteBuffer(nChunks: Int, perChunk: Int): 
ChunkedByteBuffer = {
+val bytes = (0 until nChunks).map { chunkIdx =>
+  val bb = ByteBuffer.allocate(perChunk)
+  (0 until perChunk).foreach { idx =>
+bb.put((chunkIdx * perChunk + idx).toByte)
+  }
+  bb.position(0)
+  bb
+}.toArray
+new ChunkedByteBuffer(bytes)
+  }
+
+  test("transferTo can stop and resume correctly") {
+SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, 9L)
+val cbb = generateChunkByteBuffer(4, 10)
+val fileRegion = cbb.toNetty
+
+val targetChannel = new LimitedWritableByteChannel(40)
+
+var pos = 0L
+// write the fileregion to the channel, but with the transfer limited 
at various spots along
+// the way.
+
+// limit to within the first chunk
+targetChannel.acceptNBytes = 5
+pos = fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 5)
+
+// a little bit further within the first chunk
+targetChannel.acceptNBytes = 2
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 7)
+
+// past the first chunk, into the 2nd
+targetChannel.acceptNBytes = 6
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 13)
+
+// right to the end of the 2nd chunk
+targetChannel.acceptNBytes = 7
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 20)
+
+// rest of 2nd chunk, all of 3rd, some of 4th
+targetChannel.acceptNBytes = 15
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 35)
+
+// now till the end
+targetChannel.acceptNBytes = 5
+pos += fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 40)
+
+// calling again at the end should be OK
+targetChannel.acceptNBytes = 20
+fileRegion.transferTo(targetChannel, pos)
+assert(targetChannel.pos === 40)
+  }
+
+  test(s"transfer to with random limits") {
+val rng = new Random()
+val seed = System.currentTimeMillis()
+logInfo(s"seed = $seed")
+rng.setSeed(seed)
+val chunkSize = 1e4.toInt
+SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, 
rng.nextInt(chunkSize).toLong)
+
+val cbb = generateChunkByteBuffer(50, chunkSize)
+val fileRegion = cbb.toNetty
+val transferLimit = 1e5.toInt
+val targetChannel = new LimitedWritableByteChannel(transferLimit)
+while (targetChannel.pos < cbb.size) {
+  val nextTransferSize = rng.nextInt(transferLimit)
+  targetChannel.acceptNBytes = nextTransferSize
+  fileR

[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...

2018-05-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21442
  
LGTM, pending jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191471982
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
   case expr @ In(v, list) if expr.inSetConvertible =>
 val newList = ExpressionSet(list).toSeq
-if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) {
+if (newList.length == 1) {
--- End diff --

This is too minor, I'd like to keep the current code and not break the code 
flow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21442
  
**[Test build #91258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91258/testReport)**
 for PR 21442 at commit 
[`7a354fc`](https://github.com/apache/spark/commit/7a354fcd154ec2d8f88a5c1fbf1bd75fdb15ec49).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...

2018-05-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21442
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...

2018-05-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21440#discussion_r191471289
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala 
---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util.io
+
+import java.nio.channels.WritableByteChannel
+
+import io.netty.channel.FileRegion
+import io.netty.util.AbstractReferenceCounted
+
+import org.apache.spark.internal.Logging
+
+
+/**
+ * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow 
sending > 2gb in one netty
+ * message.   This is because netty cannot send a ByteBuf > 2g, but it can 
send a large FileRegion,
+ * even though the data is not backed by a file.
+ */
+private[io] class ChunkedByteBufferFileRegion(
+val chunkedByteBuffer: ChunkedByteBuffer,
+val ioChunkSize: Int) extends AbstractReferenceCounted with FileRegion 
with Logging {
+
+  private var _transferred: Long = 0
+  // this duplicates the original chunks, so we're free to modify the 
position, limit, etc.
+  private val chunks = chunkedByteBuffer.getChunks()
+  private val cumLength = chunks.scanLeft(0L) { _ + _.remaining()}
+  private val size = cumLength.last
+  // Chunk size in bytes
+
+  protected def deallocate: Unit = {}
+
+  override def count(): Long = chunkedByteBuffer.size
--- End diff --

no difference, `count()` is just to satisfy an interface.  My mistake for 
having them look different, I'll make them the same


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191470695
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -900,6 +900,22 @@ def __call__(self, x):
 self.assertEqual(f, f_.func)
 self.assertEqual(return_type, f_.returnType)
 
+def test_stopiteration_in_udf(self):
--- End diff --

We can also merge this PR first. I can follow up with the pandas udf tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191469832
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -900,6 +900,22 @@ def __call__(self, x):
 self.assertEqual(f, f_.func)
 self.assertEqual(return_type, f_.returnType)
 
+def test_stopiteration_in_udf(self):
--- End diff --

Can we also add tests for pandas_udf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191469562
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# for pandas UDFs the worker needs to know if the function takes
+# one or two arguments, but the signature is lost when wrapping 
with
+# fail_on_stopiteration, so we store it here
+if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF,
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF):
+func._argspec = _get_argspec(self.func)
--- End diff --

Does it make sense for `fail_one_stopiteration`  to keep the function 
signature instead of restoring them here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21383
  
still lgtm if the tests pass


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21383
  
**[Test build #91257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91257/testReport)**
 for PR 21383 at commit 
[`8fac2a8`](https://github.com/apache/spark/commit/8fac2a80deb79030dee161e0d86b7b090bc892a7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-29 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21445
  
Sure, we need to support this, but the approach in this PR doesn't work if 
it breaks existing tests.

I think the best way to do it is to make the shuffle writer responsible for 
incrementing the epoch within its task, the same way the data source writer 
does currently.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread daniel-shields
Github user daniel-shields commented on the issue:

https://github.com/apache/spark/pull/21449
  
I'm not sure that this behavior should be applied to all binary 
comparisons. It could result in unexpected behavior in some rare cases. For 
example:
`df1.join(df2, df2['x'] < df1['x'])`
If 'x' is ambiguous, this would result in the conditional being flipped.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21445
  
@LiangchangZ 

> In the real CP situation, reader and writer may be always in different 
tasks, right?

Continuous mode already supports some valid use cases, and putting all in 
one task would be fastest in such use cases though tasks can be parallelized by 
partition. Unless we have valid reason to separate reader and writer even in 
non-shuffle query, it would be better to keep it as it is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...

2018-05-29 Thread skonto
Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21378#discussion_r191449756
  
--- Diff: docs/running-on-mesos.md ---
@@ -753,6 +753,16 @@ See the [configuration page](configuration.html) for 
information on Spark config
 spark.cores.max is reached
   
 
++
+  spark.mesos.appJar.local.resolution.mode
+  host
+  
+  Provides support for the local:/// scheme to reference the app jar 
resource in cluster mode.
+  If user uses a local resource (local:///path/to/jar) and the config 
option is not used it defaults to `host` eg. the mesos fetcher tries to get the 
resource from the host's file system.
+  If the value is unknown it prints a warning msg in the dispatcher logs 
and defaults to `host`.
+  If the value is `container` then spark submit in the container will use 
the jar in the container's path: /path/to/jar.
+  
--- End diff --

Ok I havent built the docs, will update it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...

2018-05-29 Thread skonto
Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21378#discussion_r191449547
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -418,17 +417,33 @@ private[spark] class MesosClusterScheduler(
 envBuilder.build()
   }
 
+  private def isContainerLocalAppJar(desc: MesosDriverDescription): 
Boolean = {
+val isLocalJar = desc.jarUrl.startsWith("local://")
+val isContainerLocal = 
desc.conf.getOption("spark.mesos.appJar.local.resolution.mode").exists {
+  case "container" => true
+  case "host" => false
+  case other =>
+logWarning(s"Unknown spark.mesos.appJar.local.resolution.mode 
$other, using host.")
+false
+  }
--- End diff --

What's wrong with exists? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-29 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21317
  
@mccheah gentle ping for merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91249/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21426
  
**[Test build #91249 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91249/testReport)**
 for PR 21426 at commit 
[`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191441634
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# prevent inspect to fail
+# e.g. inspect.getargspec(sum) raises
+# TypeError:  is not a Python function
+try:
+func._argspec = _get_argspec(self.func)
+except TypeError:
--- End diff --

Also, let's leave a comment saying like this argspec is used for Pandas 
UDFs and the hack is to keep the original signature of given functions since 
there seem no way to copy it in Python 2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...

2018-05-29 Thread WenboZhao
Github user WenboZhao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21432#discussion_r191441199
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -63,17 +63,17 @@ class RelationalGroupedDataset protected[sql](
 groupType match {
   case RelationalGroupedDataset.GroupByType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(groupingExprs, aliasedAgg, 
df.logicalPlan))
+  df.sparkSession, Aggregate(groupingExprs, aliasedAgg, 
df.planWithBarrier))
   case RelationalGroupedDataset.RollupType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), 
aliasedAgg, df.logicalPlan))
+  df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), 
aliasedAgg, df.planWithBarrier))
   case RelationalGroupedDataset.CubeType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, 
df.logicalPlan))
+  df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, 
df.planWithBarrier))
   case RelationalGroupedDataset.PivotType(pivotCol, values) =>
 val aliasedGrps = groupingExprs.map(alias)
 Dataset.ofRows(
-  df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, 
aggExprs, df.logicalPlan))
+  df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, 
aggExprs, df.planWithBarrier))
--- End diff --

ha~ Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191437141
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# prevent inspect to fail
+# e.g. inspect.getargspec(sum) raises
+# TypeError:  is not a Python function
+try:
+func._argspec = _get_argspec(self.func)
+except TypeError:
--- End diff --

Let's use all of Pandas UDFs:


https://github.com/apache/spark/blob/a9350d7095b79c8374fb4a06fd3f1a1a67615f6f/python/pyspark/sql/udf.py#L40-L67


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21405
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91251/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21405
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3669/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21405
  
**[Test build #91251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91251/testReport)**
 for PR 21405 at commit 
[`e30be7a`](https://github.com/apache/spark/commit/e30be7a3da9061a71e8ac222006e20b4e173cb74).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21260
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3539/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21448
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91250/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21448
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21448
  
**[Test build #91250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91250/testReport)**
 for PR 21448 at commit 
[`487a467`](https://github.com/apache/spark/commit/487a467219014ea2e322861c3053d2a374740058).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown

2018-05-29 Thread TomaszGaweda
Github user TomaszGaweda commented on the issue:

https://github.com/apache/spark/pull/21360
  
Hi @maryannxue, thanks for the PR! Could you please rebase it? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21260
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3539/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21260
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3668/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21260
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20701
  
**[Test build #91256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91256/testReport)**
 for PR 20701 at commit 
[`e2f68ac`](https://github.com/apache/spark/commit/e2f68ac612227aaafa809ad5f5074d1984aa907e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...

2018-05-29 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21432#discussion_r191429427
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -63,17 +63,17 @@ class RelationalGroupedDataset protected[sql](
 groupType match {
   case RelationalGroupedDataset.GroupByType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(groupingExprs, aliasedAgg, 
df.logicalPlan))
+  df.sparkSession, Aggregate(groupingExprs, aliasedAgg, 
df.planWithBarrier))
   case RelationalGroupedDataset.RollupType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), 
aliasedAgg, df.logicalPlan))
+  df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), 
aliasedAgg, df.planWithBarrier))
   case RelationalGroupedDataset.CubeType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, 
df.logicalPlan))
+  df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, 
df.planWithBarrier))
   case RelationalGroupedDataset.PivotType(pivotCol, values) =>
 val aliasedGrps = groupingExprs.map(alias)
 Dataset.ofRows(
-  df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, 
aggExprs, df.logicalPlan))
+  df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, 
aggExprs, df.planWithBarrier))
--- End diff --

It was already completed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...

2018-05-29 Thread WenboZhao
Github user WenboZhao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21432#discussion_r191428684
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -63,17 +63,17 @@ class RelationalGroupedDataset protected[sql](
 groupType match {
   case RelationalGroupedDataset.GroupByType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(groupingExprs, aliasedAgg, 
df.logicalPlan))
+  df.sparkSession, Aggregate(groupingExprs, aliasedAgg, 
df.planWithBarrier))
   case RelationalGroupedDataset.RollupType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), 
aliasedAgg, df.logicalPlan))
+  df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), 
aliasedAgg, df.planWithBarrier))
   case RelationalGroupedDataset.CubeType =>
 Dataset.ofRows(
-  df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, 
df.logicalPlan))
+  df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, 
df.planWithBarrier))
   case RelationalGroupedDataset.PivotType(pivotCol, values) =>
 val aliasedGrps = groupingExprs.map(alias)
 Dataset.ofRows(
-  df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, 
aggExprs, df.logicalPlan))
+  df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, 
aggExprs, df.planWithBarrier))
--- End diff --

Any plan to make this fix complete?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191427825
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# prevent inspect to fail
+# e.g. inspect.getargspec(sum) raises
+# TypeError:  is not a Python function
+try:
+func._argspec = _get_argspec(self.func)
+except TypeError:
--- End diff --

Aha, you mean to do the hack in `UserDefinedFunction._create_judf`, and 
check the condition there? Sorry, I thought were talking about `_create_udf`. 
Wilco then


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21260
  
**[Test build #91255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91255/testReport)**
 for PR 21260 at commit 
[`673fb83`](https://github.com/apache/spark/commit/673fb839cc5c8fa062b35eb6e1d1a9caa833b27b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21260
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91255/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21260
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21260
  
**[Test build #91255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91255/testReport)**
 for PR 21260 at commit 
[`673fb83`](https://github.com/apache/spark/commit/673fb839cc5c8fa062b35eb6e1d1a9caa833b27b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r191422043
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -157,7 +157,17 @@ def _create_judf(self):
 spark = SparkSession.builder.getOrCreate()
 sc = spark.sparkContext
 
-wrapped_func = _wrap_function(sc, self.func, self.returnType)
+func = fail_on_stopiteration(self.func)
+
+# prevent inspect to fail
+# e.g. inspect.getargspec(sum) raises
+# TypeError:  is not a Python function
+try:
+func._argspec = _get_argspec(self.func)
+except TypeError:
--- End diff --

Eh, don't we have `self.evalType` and I thought we could simply check it? I 
got that the current way is the "recommended" way to deal with divergence in 
Python but let's just explicitly scope here to make it easier to be fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21422
  
There was a similar(?) argument before - 
https://github.com/apache/spark/pull/18873#issuecomment-320864941, FYI.

@gentlewangyu, let's just close this then. The gain is small anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >