GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/21397
[SPARK-24334] Fix race condition in ArrowPythonRunner causes unclean
shutdown of Arrow memory allocator
## What changes were proposed in this pull request?
There is a race condition of closing Arrow VectorSchemaRoot and Allocator
in the writer thread of ArrowPythonRunner.
The race results in memory leak exception when closing the allocator. This
patch removes the closing routine from the TaskCompletionListener and make the
writer thread responsible for cleaning up the Arrow memory.
## How was this patch tested?
Because of the race condition, the bug cannot be unit test easily. So far
it has only happens on large amount of data. This is currently tested manually.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/icexelloss/spark SPARK-24334-arrow-memory-leak
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21397.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21397
----
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]