GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/9427
[SPARK-11293] Fix shuffle memory leaks in Spillable collections and
UnsafeShuffleWriter (branch-1.5)
This patch fixes multiple memory leaks in `Spillable` collections, as well
as a leak in `UnsafeShuffleWriter`. There were a small handful of places where
tasks would acquire memory from the `ShuffleMemoryManager` but would not
release it by the time the task had ended. The `UnsafeShuffleWriter` case was
harmless, since the leak could only occur at the very end of a task, but the
other two cases are somewhat serious:
- `ExternalSorter.stop()` did not release the sorter's memory. In addition,
`BlockStoreShuffleReader` never called `stop()` once the sorter's iterator was
fully-consumed. Put together, these bugs meant that a shuffle which performed a
reduce-side could starve downstream piplelined transformations of shuffle
memory.
- `ExternalAppendOnlyMap` exposes no equivalent of `stop()` and its
iterators do not automatically free its in-memory data upon completion. This
could cause aggregation operations to starve other operations of shuffle memory.
This patch adds a regression test and fixes all three leaks.
This patch was originally opened as #9260; this version is the 1.5.x
backport.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark
SPARK-11293-branch-1.5-backport
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9427.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9427
----
commit 46ef394c1f7e139a36b0e9812d981c60321cce9e
Author: Josh Rosen <[email protected]>
Date: 2015-10-24T00:42:26Z
Enable shuffle memory leak detection in tests.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]