[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20519 I hope we can go with the error message one for now if we are all fine here. Will keep my eyes on JIRA and mailing list and check if similar issues are raised again .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20519 I'm ok with either approach; the stdout approach can still suffer from people doing crazy things in their customizations (like messing with what stdout is), but at that point I'm ok with not supporting those use cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20519 >Can't we just go with the error message for now? Sure, but before we decide not to fix, did you see my comment [above](https://github.com/apache/spark/pull/20519#issuecomment-364677070) on a much less invasive fix to this issue (very little will change but will fix it in most cases). Summary of that comment: daemon.py should write some expected header to stdout before writing the port number to let PythonWorkerFactory know when the daemon's output starts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20519 For the change itself in this PR. it looks a overkill .. Can't we just go with the error message for now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20519 >yea but we can't simply flush and ignore the stdout specifically from sitecustomize unless we define a kind of an additional protocol like this because we can't simply distinguish if the output We might be able to distinguish between sitecustomize.py output and daemon.py output. Assuming the code in the sitecustomize.py is not multi-threaded, we can assume all output from sitecustomize.py comes *before* any output from daemon.py. Therefore, if daemon.py first prints a "magic number" or some other string that is unlikely to show up in sitecustomize.py output, PythonWorkerFactory.startDaemon() will know when daemon.py output starts. daemon.py would print the port number only after printing this magic value. For example: daemon port: ^@^@\325 Once the scala code sees "daemon port: " in the launched process's stdout, it knows the next 4 bytes are the port number. However, if sitecustomize.py starts multi-threaded code (and if that's even possible, that's a corner-corner-corner case), its output could potentially be interleaved with the daemon's output. Also, I am not sure sitecustomize.py output is guaranteed to show up first in stdout, but it seems reasonable that it would. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20519 Hm .. yea but we can't simply flush and ignore the stdout specifically from sitecustomize unless we define a kind of an additional protocol like this because we can't simply distinguish if the output is from sitecustomize or the daemon but they are put int stdout together. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20519 I am not yet fully sure about how sitecustomize fills some bytes to stdout but I believe it doesn't always happen because at least we have it for Python coverage - https://github.com/apache/spark/blob/master/python/test_coverage/sitecustomize.py. The only way I used to verify this PR and https://github.com/apache/spark/pull/20424 was to manually add some bytes to stdout. So, I am still thinking it's a corner case .. Can we otherwise simply flush and remove the stdout right before starting the daemon? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87267/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87267/testReport)** for PR 20519 at commit [`e07806e`](https://github.com/apache/spark/commit/e07806e4180a79badfb86ac67fd5707317525456). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87267/testReport)** for PR 20519 at commit [`e07806e`](https://github.com/apache/spark/commit/e07806e4180a79badfb86ac67fd5707317525456). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87262/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87262/testReport)** for PR 20519 at commit [`3dc4f9a`](https://github.com/apache/spark/commit/3dc4f9a62c5b0559fa5d9601f17516dcb9e16530). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87262/testReport)** for PR 20519 at commit [`3dc4f9a`](https://github.com/apache/spark/commit/3dc4f9a62c5b0559fa5d9601f17516dcb9e16530). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87173/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87173/testReport)** for PR 20519 at commit [`6c38829`](https://github.com/apache/spark/commit/6c38829ea5c81db0d8bd62fde249f2c01ff780e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87173/testReport)** for PR 20519 at commit [`6c38829`](https://github.com/apache/spark/commit/6c38829ea5c81db0d8bd62fde249f2c01ff780e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87171/testReport)** for PR 20519 at commit [`c63e8f4`](https://github.com/apache/spark/commit/c63e8f4b91e32ef5c3eb87ff5fafb0e7ad30261e). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87171/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20519 **[Test build #87171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87171/testReport)** for PR 20519 at commit [`c63e8f4`](https://github.com/apache/spark/commit/c63e8f4b91e32ef5c3eb87ff5fafb0e7ad30261e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20519 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20519 >corner case described in the JIRA I don't have a good feeling for how often people use python site customizations. I did encounter a real life case that caused the failure described in the Jira. This is your area, I will leave it up to you. I still have the first PR where we just better report this case rather than try to fix it. The logic of the code is largely the same: - launch the daemon - read the port number from a DataInputStream connected to the daemon. The steps to set up a socket, and the try/catches required, add quite a few lines, even though the end result is the same (read an int from a DataInputStream connected to the daemon). That being said, the code does add some complexities and timing differences. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20519 Isn't it too a invasive change to deal with a corner case described in the JIRA .. ? Did I maybe miss something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20519 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org