GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/9427

    [SPARK-11293] Fix shuffle memory leaks in Spillable collections and 
UnsafeShuffleWriter (branch-1.5)

    This patch fixes multiple memory leaks in `Spillable` collections, as well 
as a leak in `UnsafeShuffleWriter`. There were a small handful of places where 
tasks would acquire memory from the `ShuffleMemoryManager` but would not 
release it by the time the task had ended. The `UnsafeShuffleWriter` case was 
harmless, since the leak could only occur at the very end of a task, but the 
other two cases are somewhat serious:
    
    - `ExternalSorter.stop()` did not release the sorter's memory. In addition, 
`BlockStoreShuffleReader` never called `stop()` once the sorter's iterator was 
fully-consumed. Put together, these bugs meant that a shuffle which performed a 
reduce-side could starve downstream piplelined transformations of shuffle 
memory.
    - `ExternalAppendOnlyMap` exposes no equivalent of `stop()` and its 
iterators do not automatically free its in-memory data upon completion. This 
could cause aggregation operations to starve other operations of shuffle memory.
    
    This patch adds a regression test and fixes all three leaks.
    
    This patch was originally opened as #9260; this version is the 1.5.x 
backport.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark 
SPARK-11293-branch-1.5-backport

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9427
    
----
commit 46ef394c1f7e139a36b0e9812d981c60321cce9e
Author: Josh Rosen <[email protected]>
Date:   2015-10-24T00:42:26Z

    Enable shuffle memory leak detection in tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to