[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20589 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20556 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20556 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87439/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20556 **[Test build #87439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87439/testReport)** for PR 20556 at commit [`ee14cf7`](https://github.com/apache/spark/commit/ee14cf708603bd904505a110c0ca5d3607d5cdb8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20605 Also cc @tdas @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87438/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20555 **[Test build #87438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87438/testReport)** for PR 20555 at commit [`b6852aa`](https://github.com/apache/spark/commit/b6852aa8a7f0e053985d5b89ba4020792d648f82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168086187 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case _: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") +} + +// test that the returned port number is within a valid range. +// note: this does not cover the case where the port number +// is arbitrary data but is also coincidentally within range +if (daemonPort < 1 || daemonPort > 0x) { + val exceptionMessage = f""" + |Bad data in $daemonModule's standard output. + |Expected valid port number, got 0x$daemonPort%08x. + |PYTHONPATH set to '$pythonPath' + |Python command is '${command.asScala.mkString(" ")}' + |One possibility is a sitecustomize.py module in your python installation + |that is printing to stdout""" --- End diff -- But I think there's no need to put some additional logics here to deal with it tho. Let's just fix the error message to be in the similar format. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168081472 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case _: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") +} + +// test that the returned port number is within a valid range. +// note: this does not cover the case where the port number +// is arbitrary data but is also coincidentally within range +if (daemonPort < 1 || daemonPort > 0x) { + val exceptionMessage = f""" + |Bad data in $daemonModule's standard output. + |Expected valid port number, got 0x$daemonPort%08x. + |PYTHONPATH set to '$pythonPath' + |Python command is '${command.asScala.mkString(" ")}' + |One possibility is a sitecustomize.py module in your python installation + |that is printing to stdout""" --- End diff -- Oh, I see. I meant it might be located. please fix few my bad words :). One thing I was worried of is, it might print the Python path twice if it contained stderr (it can be wrapped in L232 -L235). I think in most cases the Python paths will be printed out actually because it should usually throw an exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20556#discussion_r168079061 --- Diff: dev/tox.ini --- @@ -17,3 +17,5 @@ ignore=E402,E731,E241,W503,E226,E722,E741,E305 max-line-length=100 exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/* +[pydocstyle] +ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414 --- End diff -- We need a line break at the end of the file. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20556#discussion_r168078916 --- Diff: dev/lint-python --- @@ -21,10 +21,15 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" # Exclude auto-generated configuration file. PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" )" +DOC_PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" | grep -vF 'functions.py')" --- End diff -- nit: add an extra space between `'functions.py'` and `)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87440/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20594 **[Test build #87440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87440/testReport)** for PR 20594 at commit [`3a29039`](https://github.com/apache/spark/commit/3a290392e87e6476b3a9253b902850a078dfc4ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20556#discussion_r168078994 --- Diff: dev/lint-python --- @@ -21,10 +21,15 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" # Exclude auto-generated configuration file. PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" )" +DOC_PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" | grep -vF 'functions.py')" PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt" +PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt" PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt" PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt" +PYDOCSTYLEBUILD="pydocstyle" +PYDOCSTYLEVERSION=$(python -c 'import pkg_resources; print pkg_resources.get_distribution("pydocstyle").version') --- End diff -- `..; print(pkg_resources.get_distribution("pydocstyle").version)' 2> /dev/null)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168080320 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case _: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") +} + +// test that the returned port number is within a valid range. +// note: this does not cover the case where the port number +// is arbitrary data but is also coincidentally within range +if (daemonPort < 1 || daemonPort > 0x) { + val exceptionMessage = f""" + |Bad data in $daemonModule's standard output. + |Expected valid port number, got 0x$daemonPort%08x. + |PYTHONPATH set to '$pythonPath' + |Python command is '${command.asScala.mkString(" ")}' + |One possibility is a sitecustomize.py module in your python installation + |that is printing to stdout""" --- End diff -- Nice, except this one thing: >This module 'sitecustomize.py' can be located in your Python path: /.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/: I display the path because maybe you accidentally configured the path to have some old .zip or other incompatible versions of expected python modules on your path. sitecustomize.py might not be in your path, although it could be. I found a machine that had it here: /usr/lib/python2.7/sitecustomize.py. Another, I found it here: /usr/lib64/python2.7/sitecustomize.py --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20605 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87437/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20605 **[Test build #87437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87437/testReport)** for PR 20605 at commit [`b91a3af`](https://github.com/apache/spark/commit/b91a3af185fa8c953648080c24c71664b1ebe646). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.f...
Github user cxzl25 closed the pull request at: https://github.com/apache/spark/pull/20593 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20593 Thanks! Merged to 2.2. Could you close it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20593 Supporting Hadoop 3.0 is being discussed. Now, we do not support it yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168076901 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.api.python -import java.io.{DataInputStream, DataOutputStream, InputStream, OutputStreamWriter} +import java.io._ --- End diff -- BTW, don't forget to get rid of this import change too. This import looks needed due to `IOException`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168075968 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case _: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") +} + +// test that the returned port number is within a valid range. +// note: this does not cover the case where the port number +// is arbitrary data but is also coincidentally within range +if (daemonPort < 1 || daemonPort > 0x) { + val exceptionMessage = f""" + |Bad data in $daemonModule's standard output. + |Expected valid port number, got 0x$daemonPort%08x. + |PYTHONPATH set to '$pythonPath' + |Python command is '${command.asScala.mkString(" ")}' + |One possibility is a sitecustomize.py module in your python installation + |that is printing to stdout""" + throw new IOException(exceptionMessage.stripMargin) --- End diff -- I think this rather should be `SparkException`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168075909 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case _: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") +} + +// test that the returned port number is within a valid range. +// note: this does not cover the case where the port number +// is arbitrary data but is also coincidentally within range +if (daemonPort < 1 || daemonPort > 0x) { + val exceptionMessage = f""" + |Bad data in $daemonModule's standard output. + |Expected valid port number, got 0x$daemonPort%08x. + |PYTHONPATH set to '$pythonPath' + |Python command is '${command.asScala.mkString(" ")}' + |One possibility is a sitecustomize.py module in your python installation + |that is printing to stdout""" --- End diff -- Shall we keep the format same as: https://github.com/apache/spark/blob/b63abee881f2b4379f375500d51fdef706d6d512/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L232-L235 ? the current message looks: ``` ... Caused by: java.io.IOException: Bad data in pyspark.daemon's standard output. Expected valid port number, got 0x4920616d. PYTHONPATH set to '/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:' Python command is 'python -m pyspark.daemon' Check if you have a sitecustomize.py module in your python installation. ... ``` I made a suggestion while verifying this PR: ``` ... Error from bad data in pyspark.daemon's standard output. Invalid port number: 1633771786 (0x6161610a) Python command to execute the daemon was: python -m pyspark.daemon One possibility is a sitecustomize module printing some data to the standard output. This module 'sitecustomize.py' can be located in your Python path: /.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/: ... ``` Here is the diff I used. I also did some insane nitpicks here as well. ```diff diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala index 5790c050a7f..b44aa6064bb 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala @@ -196,20 +196,24 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemonPort = in.readInt() } catch { case _: EOFException => -throw new IOException(s"No port number in $daemonModule's stdout") +throw new SparkException(s"No port number in $daemonModule's standard output.") } -// test that the returned port number is within a valid range. -// note: this does not cover the case where the port number -// is arbitrary data but is also coincidentally within range +// Check if the returned port number is within a valid range. +// Note: this does not cover the case where the port number is arbitrary data but is +// also coincidentally within range. if (daemonPort < 1 || daemonPort > 0x) { val exceptionMessage = f""" - |Bad data in $daemonModule's standard output. - |Expected valid port number, got 0x$daemonPort%08x. - |PYTHONPATH set to '$pythonPath' - |Python command is '${command.asScala.mkString(" ")}' - |Check if you have a sitecustomize.py module in your python installation.""" - throw new IOException(exceptionMessage.stripMargin) + |Error from bad data in $daemonModule's standard output. Invalid port number: + | $daemonPort (0x$daemonPort%08x) + |Python command to
[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20424 Thank you @squito .. LGTM otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r168075988 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) -daemonPort = in.readInt() +try { + daemonPort = in.readInt() +} catch { + case _: EOFException => +throw new IOException(s"No port number in $daemonModule's stdout") --- End diff -- Here too, `SparkException`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20594 **[Test build #87440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87440/testReport)** for PR 20594 at commit [`3a29039`](https://github.com/apache/spark/commit/3a290392e87e6476b3a9253b902850a078dfc4ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/888/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87436/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87436/testReport)** for PR 20554 at commit [`f9983d9`](https://github.com/apache/spark/commit/f9983d937b97b2ba9f020f370b8aefdd353a654b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20556 **[Test build #87439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87439/testReport)** for PR 20556 at commit [`ee14cf7`](https://github.com/apache/spark/commit/ee14cf708603bd904505a110c0ca5d3607d5cdb8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/887/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20556 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87435/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...
Github user rekhajoshm commented on the issue: https://github.com/apache/spark/pull/20556 build error unrelated to this PR. retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20555 **[Test build #87435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87435/testReport)** for PR 20555 at commit [`52f4a7c`](https://github.com/apache/spark/commit/52f4a7c3e97c475b4464b82c7d8e00dcd9d889b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...
Github user PandaMonkey commented on the issue: https://github.com/apache/spark/pull/20593 @dongjoon-hyun Hi, I have a digression question, the latest version of Hadoop is 3.0.0, why Spark still uses Hadoop 2.6.5? Does spark plan to upgrade Hadoop from 2.6.5 to 3.0.0? I am only a downstream of Spark, and we encountered some dependency conflicts problems. I'm not sure if it is suitable to ask this question here, if you have the upgrade plannning, I can report this in Jira. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87433/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20594 Because this is a quick fix, my idea is to have a surface patch that doesn't change existing API. The approach of adding parameter to `DefaultParamsWriter.saveMetadata` also sounds good to me, but the parameter seems useless if we get rid of this quick fix in the future. Instead of adding parameter, I think we can pass the `paramMap` parameter when calling `saveMetadata`. For #20410 and #18982, I have a question, are they regression? Seems to me they are not new issues to 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87433/testReport)** for PR 20554 at commit [`f9983d9`](https://github.com/apache/spark/commit/f9983d937b97b2ba9f020f370b8aefdd353a654b). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87434/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87434/testReport)** for PR 20387 at commit [`3b55609`](https://github.com/apache/spark/commit/3b55609b605fb461f6c2616d1da95a2d4b27ff4b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r168069150 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { } } + + private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends MLWriter { + +override protected def saveImpl(path: String): Unit = { + // SPARK-23377: The default params will be saved and loaded as user-supplied params. + // Once `inputCols` is set, the default value of `outputCol` param causes the error + // when checking exclusive params. As a temporary to fix it, we remove the default + // value of `outputCol` if `inputCols` is set before saving. + // TODO: If we modify the persistence mechanism later to better handle default params, + // we can get rid of this. + var removedOutputCol: Option[String] = None + if (instance.isSet(instance.inputCols)) { --- End diff -- Yes I think #20410 is not related to this PR for now. But I am afraid in the future, when we add more functionality, potential bugs will possible to be triggered. But I think we don't need to care the order of them to be merged. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20555 **[Test build #87438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87438/testReport)** for PR 20555 at commit [`b6852aa`](https://github.com/apache/spark/commit/b6852aa8a7f0e053985d5b89ba4020792d648f82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/886/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20599 @mgaido91 As I said in the PR description, no regression is found so far, just providing a config to be super safe. Actually this PR has a problem: the codegen usually happens at executor side, so we can't use SQLConf directy. I'll figure this out after my vacation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r168067865 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { } } + + private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends MLWriter { + +override protected def saveImpl(path: String): Unit = { + // SPARK-23377: The default params will be saved and loaded as user-supplied params. + // Once `inputCols` is set, the default value of `outputCol` param causes the error + // when checking exclusive params. As a temporary to fix it, we remove the default + // value of `outputCol` if `inputCols` is set before saving. + // TODO: If we modify the persistence mechanism later to better handle default params, + // we can get rid of this. + var removedOutputCol: Option[String] = None --- End diff -- yep. But I have some new thoughts, see my comments at bottom. -:) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20594 I thought again, instead of "removing default value and restore it again later (which may cause some side effects)", maybe the better way is, add a parameter to `DefaultParamsWriter.saveMetadata`, specify which default param need to skip when saving. @mgaido91 Yes I agree with you. Either #20410 or #18982 need to be merged to 2.3, the related issue is possible to cause some strange bugs. cc @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20599#discussion_r168067551 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1461,20 +1465,19 @@ object CodeGenerator extends Logging { CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length) try { val cf = new ClassFile(new ByteArrayInputStream(classBytes)) -val stats = cf.methodInfos.asScala.flatMap { method => +cf.methodInfos.asScala.flatMap { method => --- End diff -- not, but a small clean up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20599#discussion_r168067521 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -675,6 +675,16 @@ object SQLConf { "disable logging or -1 to apply no limit.") .createWithDefault(1000) + val CODEGEN_TRY_INLINE_ALL_STATES = +buildConf("spark.sql.codegen.tryInlineAllStates") +.internal() +.doc("When adding mutable states during code generation, whether or not we should try to " + + "inline all the states. If this config is false, we only try to inline primitive stats, " + + "so that primitive states are more likely to be inlined. Set this config to true to make " + + "the behavior same as Spark 2.2.") --- End diff -- yea, let me improve it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20590: [SPARK-23399][SQL] Register a task completion lis...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20590 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20590 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r168066120 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -230,6 +227,7 @@ private void signalAsyncReadComplete() { private void waitForAsyncReadComplete() throws IOException { stateChangeLock.lock(); +isWaiting.set(true); try { while (readInProgress) { --- End diff -- shall we add a comment about spurious wakeup? Otherwise someone else may still mistakenly remove it in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20598: [SPARK-23406] [SS] Enable stream-stream self-join...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20598#discussion_r168064805 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala --- @@ -62,7 +64,7 @@ case class StreamingRelation(dataSource: DataSource, sourceName: String, output: case class StreamingExecutionRelation( --- End diff -- not very familiar with the streaming side, but IIRC, some of these plans are temporary and will be replaced before entering analyzer, and these plans don't need to extend MultiInstanceRelation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20605 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/885/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20605 **[Test build #87437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87437/testReport)** for PR 20605 at commit [`b91a3af`](https://github.com/apache/spark/commit/b91a3af185fa8c953648080c24c71664b1ebe646). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20605: [SPARK-23419][SPARK-23416][SS] data source v2 wri...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20605 [SPARK-23419][SPARK-23416][SS] data source v2 write path should re-throw interruption exceptions directly ## What changes were proposed in this pull request? Streaming execution has a list of exceptions that means interruption, and handle them specially. `WriteToDataSourceV2Exec` should also respect this list and not wrap them with `SparkException`. ## How was this patch tested? existing test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20605.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20605 commit b91a3af185fa8c953648080c24c71664b1ebe646 Author: Wenchen Fan Date: 2018-02-14T02:18:35Z data source v2 write path should re-throw interruption exceptions directly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20605 cc @zsxwing @jose-torres @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20602#discussion_r168060518 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -369,7 +370,11 @@ abstract class StreamExecution( // exception // UncheckedExecutionException - thrown by codes that cannot throw a checked // ExecutionException, such as BiFunction.apply -case e2 @ (_: UncheckedIOException | _: ExecutionException | _: UncheckedExecutionException) +// SparkException - thrown if the interrupt happens in the middle of an RPC wait --- End diff -- does it mean this issue is nothing to do with `WriteToDataSourceV2Exec`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87429/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20604 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20589 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87430/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20604 **[Test build #87429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87429/testReport)** for PR 20604 at commit [`b5a39da`](https://github.com/apache/spark/commit/b5a39dab324be4bb358682720cf0f7e55272559d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20589 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20589 **[Test build #87430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87430/testReport)** for PR 20589 at commit [`cdc5168`](https://github.com/apache/spark/commit/cdc5168f721eed6b3634edd9eaaae8965b295ceb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20594#discussion_r168057330 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -290,6 +293,27 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { } } + + private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends MLWriter { + +override protected def saveImpl(path: String): Unit = { + // SPARK-23377: The default params will be saved and loaded as user-supplied params. + // Once `inputCols` is set, the default value of `outputCol` param causes the error + // when checking exclusive params. As a temporary to fix it, we remove the default + // value of `outputCol` if `inputCols` is set before saving. + // TODO: If we modify the persistence mechanism later to better handle default params, + // we can get rid of this. + var removedOutputCol: Option[String] = None + if (instance.isSet(instance.inputCols)) { --- End diff -- Why? I think they are orthogonal and this shouldn't cause the issue in Python side. Besides, as the PySpark multi-column support is not added yet (it's reverted), I think we don't hit the Python API issue. This is a quick fix to deal with the persistence bug. I'm not sure we should be blocked. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20603 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87431/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20596: [SPARK-23404][CORE]When the underlying buffers are direc...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20596 Can you please elaborate the case to support your fix here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20603 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20603 **[Test build #87431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87431/testReport)** for PR 20603 at commit [`bd06193`](https://github.com/apache/spark/commit/bd06193a1f9d2a6289a1fad768904ccb017ada56). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87432/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87432/testReport)** for PR 20387 at commit [`b8e3623`](https://github.com/apache/spark/commit/b8e3623837047949b39141e46eb96f30de8aa21e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/884/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20554 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20554 **[Test build #87436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87436/testReport)** for PR 20554 at commit [`f9983d9`](https://github.com/apache/spark/commit/f9983d937b97b2ba9f020f370b8aefdd353a654b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/883/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20555 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20554 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20555 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20555 **[Test build #87435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87435/testReport)** for PR 20555 at commit [`52f4a7c`](https://github.com/apache/spark/commit/52f4a7c3e97c475b4464b82c7d8e00dcd9d889b3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r168048937 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -78,9 +79,8 @@ // whether there is a read ahead task running, private boolean isReading; - // If the remaining data size in the current buffer is below this threshold, - // we issue an async read from the underlying input stream. - private final int readAheadThresholdInBytes; + // whether there is a reader waiting for data. + private AtomicBoolean isWaiting = new AtomicBoolean(false); --- End diff -- I'll leave it be - should compile to basically the same, and with using `AtomicBoolean` the intent seems more readable to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...
Github user juliuszsompolski commented on a diff in the pull request: https://github.com/apache/spark/pull/20555#discussion_r168048795 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -230,24 +227,32 @@ private void signalAsyncReadComplete() { private void waitForAsyncReadComplete() throws IOException { stateChangeLock.lock(); +isWaiting.set(true); try { - while (readInProgress) { + if (readInProgress) { --- End diff -- Good catch, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 Okay, I rebased again after SPARK-23303 was reverted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20373 >Does the hijacking of the namedtuple still cause problems on Python 3.6? I'm not too familiar with the history of this, but I ran PySpark tests that cover namedtuples with 3.6.3 and all passed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20387 **[Test build #87434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87434/testReport)** for PR 20387 at commit [`3b55609`](https://github.com/apache/spark/commit/3b55609b605fb461f6c2616d1da95a2d4b27ff4b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/882/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20373 I think it's fine in cloudpickle but Spark has the hijacking for regular pickling. I was thinking of a possibility for a deduplicated fix but might have to be investigated separately. Let's hold this on a bit until the release of 2.3.0 as it's going to go into master anyway (I think). Seems it's been delayed unexpectedly and we better keep the diff small between master and branch-2.3 for now. Will keep my eyes on this PR anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20477 Thank you very much @gatorsmile, I promise I will do a proper review of the streaming side when you reopen this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20474 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org