[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user vamaral1 commented on the issue: https://github.com/apache/spark/pull/21397 Thanks for the quick responses. I did try to build everything from scratch and am still getting the error on large datasets. If I run on a few tens of GB, there's no problem but once it gets to a couple hundred GB, that's when I start seeing the issue. I will try to create a reproducible example and post it here shortly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21397 @vamaral1 , I've seen this error too and I'm trying to remember what the cause was.. I think it can happen when there is some files get mixed up when updating/building. If you're building your own spark with this patch, try first to clean everything and rebuild. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 This seems to indicate that the arrow stream from java -> python is closed prematurely. If you have a way to reproduce I am happy to take a lok=ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user vamaral1 commented on the issue: https://github.com/apache/spark/pull/21397 Thanks for the fix. I was having the memory leak issue described in [JIRA](https://issues.apache.org/jira/browse/SPARK-24334) when working with pandas udf's but was able to fix it after upgrading my Spark version to get the patch. However, now I'm getting an issue related with the serializer and I'm having trouble debugging and understanding the stack trace. Any ideas? ``` INFO TaskSetManager: Lost task [...] org.apache.spark.api.python.PythonException (Traceback (most recent call last): File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 230, in main process() File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 225, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 260, in dump_stream for series in iterator: File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 279, in load_stream for batch in reader: File "ipc.pxi", line 268, in __iter__ File "ipc.pxi", line 284, in pyarrow.lib._RecordBatchReader.read_next_batch File "error.pxi", line 79, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: read length must be positive or -1 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21397 Thank you for fixing this :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 Thanks all for review! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21397 Merged to master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91201/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #91201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91201/testReport)** for PR 21397 at commit [`756a73a`](https://github.com/apache/spark/commit/756a73aea843e8d5d90994d127c0d9d4c357c67b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 Sure! Added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21397 Btw, can you add a short note in PR description for the reason why the test is just in the PR description? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21397 LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3619/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #91201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91201/testReport)** for PR 21397 at commit [`756a73a`](https://github.com/apache/spark/commit/756a73aea843e8d5d90994d127c0d9d4c357c67b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21397 This seems like it should be good to me. It's a little bit different than the ArrowConverters that also have a listener, because they are iterators and the cleanup can't be put in a finally. I would like for @ueshin to take a look though. Also, I don't think we should include the unit test if it doesn't create the issue every time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 Hey @BryanCutler, any more thoughts on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 Only when udf raises error. In normal case, there is no race because the writer thread always closes the root and allocator before task completion listener runs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21397 One more question, do you only observe this when the python udf raises an error or have you seen it in normal runtime operation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 Yeah this is a bit tricky because of the race. The test does fail on my machine without the fix. I have been changing the test data size until I can reproduce it, so it's not great. If you want to reproduce the memory leak, probably need to increase the data size or sth. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90991/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #90991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90991/testReport)** for PR 21397 at commit [`69c9104`](https://github.com/apache/spark/commit/69c91043981056a732328b474986fe4128936c62). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21397 @icexelloss the test you added passes for me without the fix, but I did see a message of a suppressed exception in the finally block "java.lang.IllegalStateException: ArrowBuf[3] refCnt has gone negative." I'll try to look into it in more detail soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3475/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 @BryanCutler @HyukjinKwon I am able to have it reproduced in unit test. Please take a look thanks! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #90991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90991/testReport)** for PR 21397 at commit [`69c9104`](https://github.com/apache/spark/commit/69c91043981056a732328b474986fe4128936c62). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90972/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #90972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90972/testReport)** for PR 21397 at commit [`435ccff`](https://github.com/apache/spark/commit/435ccfff44995ca5ad487e77128b2cae4ff1cfd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 So far I have been using a local parquet file to test. Let me try if I can create one on the fly to reproduce this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21397 Yea, I was wondering about it too. It should be nicer if we have some steps in the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21397 I'm guessing making a reliable test is too difficult? is it possible to provide some steps to reproduce? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3463/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21397 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21397 cc @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21397 **[Test build #90972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90972/testReport)** for PR 21397 at commit [`435ccff`](https://github.com/apache/spark/commit/435ccfff44995ca5ad487e77128b2cae4ff1cfd5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org