Github user vamaral1 commented on the issue:
https://github.com/apache/spark/pull/21397
Thanks for the fix. I was having the memory leak issue described in
[JIRA](https://issues.apache.org/jira/browse/SPARK-24334) when working with
pandas udf's but was able to fix it after upgrading my Spark version to get the
patch. However, now I'm getting an issue related with the serializer and I'm
having trouble debugging and understanding the stack trace. Any ideas?
```
INFO TaskSetManager: Lost task [...]
org.apache.spark.api.python.PythonException (Traceback (most recent call last):
File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line
230, in main
process()
File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line
225, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py",
line 260, in dump_stream
for series in iterator:
File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py",
line 279, in load_stream
for batch in reader:
File "ipc.pxi", line 268, in __iter__
File "ipc.pxi", line 284, in
pyarrow.lib._RecordBatchReader.read_next_batch
File "error.pxi", line 79, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: read length must be positive or -1
```
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org