[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-06-26 Thread vamaral1
Github user vamaral1 commented on the issue:

https://github.com/apache/spark/pull/21397
  
Thanks for the quick responses. I did try to build everything from scratch 
and am still getting the error on large datasets. If I run on a few tens of GB, 
there's no problem but once it gets to a couple hundred GB, that's when I start 
seeing the issue. I will try to create a reproducible example and post it here 
shortly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-06-26 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21397
  
@vamaral1 , I've seen this error too and I'm trying to remember what the 
cause was.. I think it can happen when there is some files get mixed up when 
updating/building.  If you're building your own spark with this patch, try 
first to clean everything and rebuild.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-06-26 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
This seems to indicate that the arrow stream from java -> python is closed 
prematurely. If you have a way to reproduce I am happy to take a lok=ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-06-26 Thread vamaral1
Github user vamaral1 commented on the issue:

https://github.com/apache/spark/pull/21397
  
Thanks for the fix. I was having the memory leak issue described in 
[JIRA](https://issues.apache.org/jira/browse/SPARK-24334) when working with 
pandas udf's but was able to fix it after upgrading my Spark version to get the 
patch. However, now I'm getting an issue related with the serializer and I'm 
having trouble debugging and understanding the stack trace. Any ideas?

```
INFO TaskSetManager: Lost task [...] 
org.apache.spark.api.python.PythonException (Traceback (most recent call last):
  File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 
230, in main
process()
  File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 
225, in process
serializer.dump_stream(func(split_index, iterator), outfile)
  File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", 
line 260, in dump_stream
for series in iterator:
  File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", 
line 279, in load_stream
for batch in reader:
  File "ipc.pxi", line 268, in __iter__
  File "ipc.pxi", line 284, in 
pyarrow.lib._RecordBatchReader.read_next_batch
  File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: read length must be positive or -1
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21397
  
Thank you for fixing this :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-29 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
Thanks all for review!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91201/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #91201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91201/testReport)**
 for PR 21397 at commit 
[`756a73a`](https://github.com/apache/spark/commit/756a73aea843e8d5d90994d127c0d9d4c357c67b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
Sure! Added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21397
  
Btw, can you add a short note in PR description for the reason why the test 
is just in the PR description? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21397
  
LGTM too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3619/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #91201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91201/testReport)**
 for PR 21397 at commit 
[`756a73a`](https://github.com/apache/spark/commit/756a73aea843e8d5d90994d127c0d9d4c357c67b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-26 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21397
  
This seems like it should be good to me.  It's a little bit different than 
the ArrowConverters that also have a listener, because they are iterators and 
the cleanup can't be put in a finally.  I would like for @ueshin to take a look 
though.

Also, I don't think we should include the unit test if it doesn't create 
the issue every time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-24 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
Hey @BryanCutler, any more thoughts on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
Only when udf raises error. In normal case, there is no race because the 
writer thread always closes the root and allocator before task completion 
listener runs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21397
  
One more question, do you only observe this when the python udf raises an 
error or have you seen it in normal runtime operation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
Yeah this is a bit tricky because of the race. The test does fail on my 
machine without the fix.

I have been changing the test data size until I can reproduce it, so it's 
not great. If you want to reproduce the memory leak, probably need to increase 
the data size or sth.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90991/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #90991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90991/testReport)**
 for PR 21397 at commit 
[`69c9104`](https://github.com/apache/spark/commit/69c91043981056a732328b474986fe4128936c62).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21397
  
@icexelloss the test you added passes for me without the fix, but I did see 
a message of a suppressed exception in the finally block 
"java.lang.IllegalStateException: ArrowBuf[3] refCnt has gone negative."
I'll try to look into it in more detail soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3475/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
@BryanCutler @HyukjinKwon I am able to have it reproduced in unit test. 
Please take a look thanks! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #90991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90991/testReport)**
 for PR 21397 at commit 
[`69c9104`](https://github.com/apache/spark/commit/69c91043981056a732328b474986fe4128936c62).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90972/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #90972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90972/testReport)**
 for PR 21397 at commit 
[`435ccff`](https://github.com/apache/spark/commit/435ccfff44995ca5ad487e77128b2cae4ff1cfd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
So far I have been using a local parquet file to test. Let me try if I can 
create one on the fly to reproduce this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21397
  
Yea, I was wondering about it too. It should be nicer if we have some steps 
in the PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21397
  
I'm guessing making a reliable test is too difficult?  is it possible to 
provide some steps to reproduce?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3463/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21397
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21397
  
cc @BryanCutler 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

2018-05-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21397
  
**[Test build #90972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90972/testReport)**
 for PR 21397 at commit 
[`435ccff`](https://github.com/apache/spark/commit/435ccfff44995ca5ad487e77128b2cae4ff1cfd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org