[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3675/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91265/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91265/testReport)** for PR 20697 at commit [`5a8fd7f`](https://github.com/apache/spark/commit/5a8fd7ff400bcf44e94800905bfa48715eaba88c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91265/testReport)** for PR 20697 at commit [`5a8fd7f`](https://github.com/apache/spark/commit/5a8fd7ff400bcf44e94800905bfa48715eaba88c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191486261 --- Diff: resource-managers/kubernetes/integration-tests/src/test/resources/log4j.properties --- @@ -0,0 +1,31 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set everything to be logged to the file target/integration-tests.log +log4j.rootCategory=INFO, file --- End diff -- It's still necessary, it does not inherit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91252/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21449 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91253/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #91252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91252/testReport)** for PR 13599 at commit [`44500fc`](https://github.com/apache/spark/commit/44500fc0d66bd930cc12ba6b66985e08f61d9ecc). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` * ` class DriverEndpoint(override val rpcEnv: RpcEnv)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21449 **[Test build #91253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91253/testReport)** for PR 21449 at commit [`92cb513`](https://github.com/apache/spark/commit/92cb513416c5dd0e9fa690c25cfae0565471a5e1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191485369 --- Diff: resource-managers/kubernetes/integration-tests/pom.xml --- @@ -0,0 +1,230 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.4.0-SNAPSHOT +../../../pom.xml + + + spark-kubernetes-integration-tests_2.11 + spark-kubernetes-integration-tests + +3.3.9 +3.5 +1.1.1 +5.0.2 +1.3.0 +1.4.0 + +18.0 +1.3.9 +3.0.0 +1.2.17 +2.11.8 +2.11 +3.2.2 +2.2.6 +1.0 +1.7.24 +kubernetes-integration-tests + ${project.build.directory}/spark-dist-unpacked +N/A + ${project.build.directory}/imageTag.txt + minikube + docker.io/kubespark + + + jar + Spark Project Kubernetes Integration Tests + + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + test-jar + test + + + + commons-logging + commons-logging + ${commons-logging.version} + + + com.google.code.findbugs + jsr305 + ${jsr305.version} + + + com.google.guava + guava + test + + ${guava.version} + + + com.spotify + docker-client + ${docker-client.version} + test + + + io.fabric8 + kubernetes-client + ${kubernetes-client.version} + + + log4j + log4j + ${log4j.version} + + + org.apache.commons + commons-lang3 + ${commons-lang3.version} + + + org.scala-lang + scala-library + ${scala.version} + + + org.scalatest + scalatest_${scala.binary.version} + test + + + org.slf4j + slf4j-log4j12 + ${slf4j-log4j12.version} + test + + + + + + +net.alchim31.maven --- End diff -- Resolved in 5a8fd7ff40 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91264/testReport)** for PR 20697 at commit [`dd28032`](https://github.com/apache/spark/commit/dd280327ea0cec6d1643007679d189529c4bc1db). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91264/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3674/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91264/testReport)** for PR 20697 at commit [`dd28032`](https://github.com/apache/spark/commit/dd280327ea0cec6d1643007679d189529c4bc1db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191483976 --- Diff: resource-managers/kubernetes/integration-tests/pom.xml --- @@ -0,0 +1,230 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.4.0-SNAPSHOT +../../../pom.xml + + + spark-kubernetes-integration-tests_2.11 + spark-kubernetes-integration-tests + +3.3.9 +3.5 +1.1.1 +5.0.2 +1.3.0 +1.4.0 + +18.0 +1.3.9 +3.0.0 +1.2.17 +2.11.8 +2.11 +3.2.2 +2.2.6 +1.0 +1.7.24 +kubernetes-integration-tests + ${project.build.directory}/spark-dist-unpacked +N/A + ${project.build.directory}/imageTag.txt + minikube + docker.io/kubespark + + + jar + Spark Project Kubernetes Integration Tests + + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + + + + org.apache.spark + spark-core_${scala.binary.version} + ${project.version} + test-jar + test + + + + commons-logging + commons-logging + ${commons-logging.version} + + + com.google.code.findbugs + jsr305 + ${jsr305.version} + + + com.google.guava + guava + test + + ${guava.version} + + + com.spotify + docker-client + ${docker-client.version} + test + + + io.fabric8 + kubernetes-client + ${kubernetes-client.version} + + + log4j + log4j + ${log4j.version} --- End diff -- Resolved in 901edb3ba3, f68cdba77b, dd280327ea --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191483921 --- Diff: resource-managers/kubernetes/integration-tests/pom.xml --- @@ -0,0 +1,230 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + 4.0.0 + +org.apache.spark +spark-parent_2.11 +2.4.0-SNAPSHOT +../../../pom.xml + + + spark-kubernetes-integration-tests_2.11 + spark-kubernetes-integration-tests + +3.3.9 +3.5 --- End diff -- Resolved in 901edb3ba3, f68cdba77b, dd280327ea --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191483040 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# for pandas UDFs the worker needs to know if the function takes +# one or two arguments, but the signature is lost when wrapping with +# fail_on_stopiteration, so we store it here +if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF): +func._argspec = _get_argspec(self.func) --- End diff -- I see. I remember we still support pandas udf with Python 2? Does the resolution here not work with Pandas UDF and Python 2? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91263/testReport)** for PR 20697 at commit [`f68cdba`](https://github.com/apache/spark/commit/f68cdba77b323bb829ee69a16b1d3688b45b3129). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3673/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191481925 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# for pandas UDFs the worker needs to know if the function takes +# one or two arguments, but the signature is lost when wrapping with +# fail_on_stopiteration, so we store it here +if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF): +func._argspec = _get_argspec(self.func) --- End diff -- That seems to be difficult in particular in Python 2. I'm not aware of a clean and straight forward way without a hack to copy the signature in Python 2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191481196 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# for pandas UDFs the worker needs to know if the function takes +# one or two arguments, but the signature is lost when wrapping with +# fail_on_stopiteration, so we store it here +if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF): +func._argspec = _get_argspec(self.func) --- End diff -- I see. I saw @HyukjinKwon comment here: https://github.com/apache/spark/pull/21383#discussion_r191441634 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21450 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91261/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91261/testReport)** for PR 20697 at commit [`901edb3`](https://github.com/apache/spark/commit/901edb3ba3a566e5c6737d15e197950abce5131c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3672/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit exec...
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/21450 [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution where no main class is required. ## What changes were proposed in this pull request? With [PR 20925](https://github.com/apache/spark/pull/20925) now it's not possible to execute the following commands: * run-example --help * run-example --version * run-example --usage-error * run-example --status ... * run-example --kill ... In this PR the execution will be allowed for the mentioned commands. ## How was this patch tested? Existing unit tests extended + additional written. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaborgsomogyi/spark SPARK-24319 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21450.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21450 commit a69850b6fdcbe2e234e70a597d9ad6beae6a6937 Author: Gabor Somogyi Date: 2018-05-29T15:38:56Z [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution where no main class is required. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91261/testReport)** for PR 20697 at commit [`901edb3`](https://github.com/apache/spark/commit/901edb3ba3a566e5c6737d15e197950abce5131c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21346 **[Test build #4190 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4190/testReport)** for PR 21346 at commit [`7bd1b43`](https://github.com/apache/spark/commit/7bd1b43c81a3cdd7b88cf64994cfe8f2b3c5fdf8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21383 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21383 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21383 **[Test build #91257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91257/testReport)** for PR 21383 at commit [`8fac2a8`](https://github.com/apache/spark/commit/8fac2a80deb79030dee161e0d86b7b090bc892a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21440 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3671/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21440 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21440 thanks for the reviews @markhamstra @Ngone51 , I've updated the pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r191476566 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -659,6 +659,11 @@ private[spark] class BlockManager( * Get block from remote block managers as serialized bytes. */ def getRemoteBytes(blockId: BlockId): Option[ChunkedByteBuffer] = { +// TODO if we change this method to return the ManagedBuffer, then getRemoteValues +// could just use the inputStream on the temp file, rather than memory-mapping the file. +// Until then, replication can cause the process to use too much memory and get killed +// by the OS / cluster manager (not a java OOM, since its a memory-mapped file) even though +// we've read the data to disk. --- End diff -- btw this fix is such low-hanging fruit that I would try to do this immediately afterwards. (I haven't filed a jira yet just because there are already so many defunct jira related to this, I was going to wait till my changes got some traction). I think its OK to get it in like this first, as this makes the behavior for 2.01 gb basically the same as it was for 1.99 gb. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91260/testReport)** for PR 20697 at commit [`cfb8ee9`](https://github.com/apache/spark/commit/cfb8ee94e11b4871f9b8c7db4774bdb6cb42c40e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91260/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21440 **[Test build #91259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91259/testReport)** for PR 21440 at commit [`a9cfe29`](https://github.com/apache/spark/commit/a9cfe294b15b2c9675d074645865fa403285d1d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20697 **[Test build #91260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91260/testReport)** for PR 20697 at commit [`cfb8ee9`](https://github.com/apache/spark/commit/cfb8ee94e11b4871f9b8c7db4774bdb6cb42c40e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21442 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3670/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21442 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user ssuchter commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191474516 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Logging.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.integrationtest + +import org.apache.log4j.{Logger, LogManager, Priority} + +trait Logging { --- End diff -- Resolved in cfb8ee94e11b4871f9b8c7db4774bdb6cb42c40e --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191473968 --- Diff: python/pyspark/sql/tests.py --- @@ -900,6 +900,22 @@ def __call__(self, x): self.assertEqual(f, f_.func) self.assertEqual(return_type, f_.returnType) +def test_stopiteration_in_udf(self): --- End diff -- Yes please, I am not really familiar with UDFs in general --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r191472949 --- Diff: core/src/test/scala/org/apache/spark/io/ChunkedByteBufferFileRegionSuite.scala --- @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io + +import java.nio.ByteBuffer +import java.nio.channels.WritableByteChannel + +import scala.util.Random + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mockito.MockitoSugar + +import org.apache.spark.{SparkConf, SparkEnv, SparkFunSuite} +import org.apache.spark.internal.config +import org.apache.spark.util.io.ChunkedByteBuffer + +class ChunkedByteBufferFileRegionSuite extends SparkFunSuite with MockitoSugar +with BeforeAndAfterEach { + + override protected def beforeEach(): Unit = { +super.beforeEach() +val conf = new SparkConf() +val env = mock[SparkEnv] +SparkEnv.set(env) +when(env.conf).thenReturn(conf) + } + + override protected def afterEach(): Unit = { +SparkEnv.set(null) + } + + private def generateChunkByteBuffer(nChunks: Int, perChunk: Int): ChunkedByteBuffer = { +val bytes = (0 until nChunks).map { chunkIdx => + val bb = ByteBuffer.allocate(perChunk) + (0 until perChunk).foreach { idx => +bb.put((chunkIdx * perChunk + idx).toByte) + } + bb.position(0) + bb +}.toArray +new ChunkedByteBuffer(bytes) + } + + test("transferTo can stop and resume correctly") { +SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, 9L) +val cbb = generateChunkByteBuffer(4, 10) +val fileRegion = cbb.toNetty + +val targetChannel = new LimitedWritableByteChannel(40) + +var pos = 0L +// write the fileregion to the channel, but with the transfer limited at various spots along +// the way. + +// limit to within the first chunk +targetChannel.acceptNBytes = 5 +pos = fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 5) + +// a little bit further within the first chunk +targetChannel.acceptNBytes = 2 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 7) + +// past the first chunk, into the 2nd +targetChannel.acceptNBytes = 6 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 13) + +// right to the end of the 2nd chunk +targetChannel.acceptNBytes = 7 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 20) + +// rest of 2nd chunk, all of 3rd, some of 4th +targetChannel.acceptNBytes = 15 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 35) + +// now till the end +targetChannel.acceptNBytes = 5 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 40) + +// calling again at the end should be OK +targetChannel.acceptNBytes = 20 +fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 40) + } + + test(s"transfer to with random limits") { +val rng = new Random() +val seed = System.currentTimeMillis() +logInfo(s"seed = $seed") +rng.setSeed(seed) +val chunkSize = 1e4.toInt +SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, rng.nextInt(chunkSize).toLong) + +val cbb = generateChunkByteBuffer(50, chunkSize) +val fileRegion = cbb.toNetty +val transferLimit = 1e5.toInt +val targetChannel = new LimitedWritableByteChannel(transferLimit) +while (targetChannel.pos < cbb.size) { + val nextTransferSize = rng.nextInt(transferLimit) + targetChannel.acceptNBytes = nextTransferSize + fileR
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191472884 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# for pandas UDFs the worker needs to know if the function takes +# one or two arguments, but the signature is lost when wrapping with +# fail_on_stopiteration, so we store it here +if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF): +func._argspec = _get_argspec(self.func) --- End diff -- I am not sure how to do that, though, can you suggest a way? Originally this hack was in `fail_one_stopiteration`, but then it was decided to restrict its scope as much as possible --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r191472540 --- Diff: core/src/test/scala/org/apache/spark/io/ChunkedByteBufferFileRegionSuite.scala --- @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io + +import java.nio.ByteBuffer +import java.nio.channels.WritableByteChannel + +import scala.util.Random + +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mockito.MockitoSugar + +import org.apache.spark.{SparkConf, SparkEnv, SparkFunSuite} +import org.apache.spark.internal.config +import org.apache.spark.util.io.ChunkedByteBuffer + +class ChunkedByteBufferFileRegionSuite extends SparkFunSuite with MockitoSugar +with BeforeAndAfterEach { + + override protected def beforeEach(): Unit = { +super.beforeEach() +val conf = new SparkConf() +val env = mock[SparkEnv] +SparkEnv.set(env) +when(env.conf).thenReturn(conf) + } + + override protected def afterEach(): Unit = { +SparkEnv.set(null) + } + + private def generateChunkByteBuffer(nChunks: Int, perChunk: Int): ChunkedByteBuffer = { +val bytes = (0 until nChunks).map { chunkIdx => + val bb = ByteBuffer.allocate(perChunk) + (0 until perChunk).foreach { idx => +bb.put((chunkIdx * perChunk + idx).toByte) + } + bb.position(0) + bb +}.toArray +new ChunkedByteBuffer(bytes) + } + + test("transferTo can stop and resume correctly") { +SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, 9L) +val cbb = generateChunkByteBuffer(4, 10) +val fileRegion = cbb.toNetty + +val targetChannel = new LimitedWritableByteChannel(40) + +var pos = 0L +// write the fileregion to the channel, but with the transfer limited at various spots along +// the way. + +// limit to within the first chunk +targetChannel.acceptNBytes = 5 +pos = fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 5) + +// a little bit further within the first chunk +targetChannel.acceptNBytes = 2 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 7) + +// past the first chunk, into the 2nd +targetChannel.acceptNBytes = 6 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 13) + +// right to the end of the 2nd chunk +targetChannel.acceptNBytes = 7 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 20) + +// rest of 2nd chunk, all of 3rd, some of 4th +targetChannel.acceptNBytes = 15 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 35) + +// now till the end +targetChannel.acceptNBytes = 5 +pos += fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 40) + +// calling again at the end should be OK +targetChannel.acceptNBytes = 20 +fileRegion.transferTo(targetChannel, pos) +assert(targetChannel.pos === 40) + } + + test(s"transfer to with random limits") { +val rng = new Random() +val seed = System.currentTimeMillis() +logInfo(s"seed = $seed") +rng.setSeed(seed) +val chunkSize = 1e4.toInt +SparkEnv.get.conf.set(config.BUFFER_WRITE_CHUNK_SIZE, rng.nextInt(chunkSize).toLong) + +val cbb = generateChunkByteBuffer(50, chunkSize) +val fileRegion = cbb.toNetty +val transferLimit = 1e5.toInt +val targetChannel = new LimitedWritableByteChannel(transferLimit) +while (targetChannel.pos < cbb.size) { + val nextTransferSize = rng.nextInt(transferLimit) + targetChannel.acceptNBytes = nextTransferSize + fileR
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21442 LGTM, pending jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191471982 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) case expr @ In(v, list) if expr.inSetConvertible => val newList = ExpressionSet(list).toSeq -if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { +if (newList.length == 1) { --- End diff -- This is too minor, I'd like to keep the current code and not break the code flow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21442 **[Test build #91258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91258/testReport)** for PR 21442 at commit [`7a354fc`](https://github.com/apache/spark/commit/7a354fcd154ec2d8f88a5c1fbf1bd75fdb15ec49). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21442 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r191471289 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.io + +import java.nio.channels.WritableByteChannel + +import io.netty.channel.FileRegion +import io.netty.util.AbstractReferenceCounted + +import org.apache.spark.internal.Logging + + +/** + * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow sending > 2gb in one netty + * message. This is because netty cannot send a ByteBuf > 2g, but it can send a large FileRegion, + * even though the data is not backed by a file. + */ +private[io] class ChunkedByteBufferFileRegion( +val chunkedByteBuffer: ChunkedByteBuffer, +val ioChunkSize: Int) extends AbstractReferenceCounted with FileRegion with Logging { + + private var _transferred: Long = 0 + // this duplicates the original chunks, so we're free to modify the position, limit, etc. + private val chunks = chunkedByteBuffer.getChunks() + private val cumLength = chunks.scanLeft(0L) { _ + _.remaining()} + private val size = cumLength.last + // Chunk size in bytes + + protected def deallocate: Unit = {} + + override def count(): Long = chunkedByteBuffer.size --- End diff -- no difference, `count()` is just to satisfy an interface. My mistake for having them look different, I'll make them the same --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191470695 --- Diff: python/pyspark/sql/tests.py --- @@ -900,6 +900,22 @@ def __call__(self, x): self.assertEqual(f, f_.func) self.assertEqual(return_type, f_.returnType) +def test_stopiteration_in_udf(self): --- End diff -- We can also merge this PR first. I can follow up with the pandas udf tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191469832 --- Diff: python/pyspark/sql/tests.py --- @@ -900,6 +900,22 @@ def __call__(self, x): self.assertEqual(f, f_.func) self.assertEqual(return_type, f_.returnType) +def test_stopiteration_in_udf(self): --- End diff -- Can we also add tests for pandas_udf? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191469562 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# for pandas UDFs the worker needs to know if the function takes +# one or two arguments, but the signature is lost when wrapping with +# fail_on_stopiteration, so we store it here +if self.evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF, + PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF): +func._argspec = _get_argspec(self.func) --- End diff -- Does it make sense for `fail_one_stopiteration` to keep the function signature instead of restoring them here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21383 still lgtm if the tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21383 **[Test build #91257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91257/testReport)** for PR 21383 at commit [`8fac2a8`](https://github.com/apache/spark/commit/8fac2a80deb79030dee161e0d86b7b090bc892a7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21445 Sure, we need to support this, but the approach in this PR doesn't work if it breaks existing tests. I think the best way to do it is to make the shuffle writer responsible for incrementing the epoch within its task, the same way the data source writer does currently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...
Github user daniel-shields commented on the issue: https://github.com/apache/spark/pull/21449 I'm not sure that this behavior should be applied to all binary comparisons. It could result in unexpected behavior in some rare cases. For example: `df1.join(df2, df2['x'] < df1['x'])` If 'x' is ambiguous, this would result in the conditional being flipped. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21445 @LiangchangZ > In the real CP situation, reader and writer may be always in different tasks, right? Continuous mode already supports some valid use cases, and putting all in one task would be fastest in such use cases though tasks can be parallelized by partition. Unless we have valid reason to separate reader and writer even in non-shuffle query, it would be better to keep it as it is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21378#discussion_r191449756 --- Diff: docs/running-on-mesos.md --- @@ -753,6 +753,16 @@ See the [configuration page](configuration.html) for information on Spark config spark.cores.max is reached ++ + spark.mesos.appJar.local.resolution.mode + host + + Provides support for the local:/// scheme to reference the app jar resource in cluster mode. + If user uses a local resource (local:///path/to/jar) and the config option is not used it defaults to `host` eg. the mesos fetcher tries to get the resource from the host's file system. + If the value is unknown it prints a warning msg in the dispatcher logs and defaults to `host`. + If the value is `container` then spark submit in the container will use the jar in the container's path: /path/to/jar. + --- End diff -- Ok I havent built the docs, will update it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21378: [SPARK-24326][Mesos] add support for local:// sch...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21378#discussion_r191449547 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -418,17 +417,33 @@ private[spark] class MesosClusterScheduler( envBuilder.build() } + private def isContainerLocalAppJar(desc: MesosDriverDescription): Boolean = { +val isLocalJar = desc.jarUrl.startsWith("local://") +val isContainerLocal = desc.conf.getOption("spark.mesos.appJar.local.resolution.mode").exists { + case "container" => true + case "host" => false + case other => +logWarning(s"Unknown spark.mesos.appJar.local.resolution.mode $other, using host.") +false + } --- End diff -- What's wrong with exists? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21317 @mccheah gentle ping for merge. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91249/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21426 **[Test build #91249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91249/testReport)** for PR 21426 at commit [`f015e0d`](https://github.com/apache/spark/commit/f015e0d587c8d9f8cd359fecc325a19362a59c55). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191441634 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# prevent inspect to fail +# e.g. inspect.getargspec(sum) raises +# TypeError: is not a Python function +try: +func._argspec = _get_argspec(self.func) +except TypeError: --- End diff -- Also, let's leave a comment saying like this argspec is used for Pandas UDFs and the hack is to keep the original signature of given functions since there seem no way to copy it in Python 2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...
Github user WenboZhao commented on a diff in the pull request: https://github.com/apache/spark/pull/21432#discussion_r191441199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -63,17 +63,17 @@ class RelationalGroupedDataset protected[sql]( groupType match { case RelationalGroupedDataset.GroupByType => Dataset.ofRows( - df.sparkSession, Aggregate(groupingExprs, aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(groupingExprs, aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.RollupType => Dataset.ofRows( - df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.CubeType => Dataset.ofRows( - df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.PivotType(pivotCol, values) => val aliasedGrps = groupingExprs.map(alias) Dataset.ofRows( - df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, aggExprs, df.logicalPlan)) + df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, aggExprs, df.planWithBarrier)) --- End diff -- ha~ Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191437141 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# prevent inspect to fail +# e.g. inspect.getargspec(sum) raises +# TypeError: is not a Python function +try: +func._argspec = _get_argspec(self.func) +except TypeError: --- End diff -- Let's use all of Pandas UDFs: https://github.com/apache/spark/blob/a9350d7095b79c8374fb4a06fd3f1a1a67615f6f/python/pyspark/sql/udf.py#L40-L67 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21405 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91251/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21405 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3669/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21405 **[Test build #91251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91251/testReport)** for PR 21405 at commit [`e30be7a`](https://github.com/apache/spark/commit/e30be7a3da9061a71e8ac222006e20b4e173cb74). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21260 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3539/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21448 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91250/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21448 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21448 **[Test build #91250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91250/testReport)** for PR 21448 at commit [`487a467`](https://github.com/apache/spark/commit/487a467219014ea2e322861c3053d2a374740058). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21360: [SPARK-24288] Enable preventing predicate pushdown
Github user TomaszGaweda commented on the issue: https://github.com/apache/spark/pull/21360 Hi @maryannxue, thanks for the PR! Could you please rebase it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21260 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3539/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3668/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21260 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20701 **[Test build #91256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91256/testReport)** for PR 20701 at commit [`e2f68ac`](https://github.com/apache/spark/commit/e2f68ac612227aaafa809ad5f5074d1984aa907e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21432#discussion_r191429427 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -63,17 +63,17 @@ class RelationalGroupedDataset protected[sql]( groupType match { case RelationalGroupedDataset.GroupByType => Dataset.ofRows( - df.sparkSession, Aggregate(groupingExprs, aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(groupingExprs, aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.RollupType => Dataset.ofRows( - df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.CubeType => Dataset.ofRows( - df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.PivotType(pivotCol, values) => val aliasedGrps = groupingExprs.map(alias) Dataset.ofRows( - df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, aggExprs, df.logicalPlan)) + df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, aggExprs, df.planWithBarrier)) --- End diff -- It was already completed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...
Github user WenboZhao commented on a diff in the pull request: https://github.com/apache/spark/pull/21432#discussion_r191428684 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -63,17 +63,17 @@ class RelationalGroupedDataset protected[sql]( groupType match { case RelationalGroupedDataset.GroupByType => Dataset.ofRows( - df.sparkSession, Aggregate(groupingExprs, aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(groupingExprs, aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.RollupType => Dataset.ofRows( - df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(Seq(Rollup(groupingExprs)), aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.CubeType => Dataset.ofRows( - df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, df.logicalPlan)) + df.sparkSession, Aggregate(Seq(Cube(groupingExprs)), aliasedAgg, df.planWithBarrier)) case RelationalGroupedDataset.PivotType(pivotCol, values) => val aliasedGrps = groupingExprs.map(alias) Dataset.ofRows( - df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, aggExprs, df.logicalPlan)) + df.sparkSession, Pivot(Some(aliasedGrps), pivotCol, values, aggExprs, df.planWithBarrier)) --- End diff -- Any plan to make this fix complete? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191427825 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# prevent inspect to fail +# e.g. inspect.getargspec(sum) raises +# TypeError: is not a Python function +try: +func._argspec = _get_argspec(self.func) +except TypeError: --- End diff -- Aha, you mean to do the hack in `UserDefinedFunction._create_judf`, and check the condition there? Sorry, I thought were talking about `_create_udf`. Wilco then --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21260 **[Test build #91255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91255/testReport)** for PR 21260 at commit [`673fb83`](https://github.com/apache/spark/commit/673fb839cc5c8fa062b35eb6e1d1a9caa833b27b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21260 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91255/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21260 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21260 **[Test build #91255 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91255/testReport)** for PR 21260 at commit [`673fb83`](https://github.com/apache/spark/commit/673fb839cc5c8fa062b35eb6e1d1a9caa833b27b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191422043 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate() sc = spark.sparkContext -wrapped_func = _wrap_function(sc, self.func, self.returnType) +func = fail_on_stopiteration(self.func) + +# prevent inspect to fail +# e.g. inspect.getargspec(sum) raises +# TypeError: is not a Python function +try: +func._argspec = _get_argspec(self.func) +except TypeError: --- End diff -- Eh, don't we have `self.evalType` and I thought we could simply check it? I got that the current way is the "recommended" way to deal with divergence in Python but let's just explicitly scope here to make it easier to be fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21422 There was a similar(?) argument before - https://github.com/apache/spark/pull/18873#issuecomment-320864941, FYI. @gentlewangyu, let's just close this then. The gain is small anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org