GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/8011

    [SPARK-9709] [SQL] Avoid starving unsafe operators that use sort

    The issue is that a task may run multiple sorts, and the sorts run by the 
child operator (i.e. parent RDD) may acquire all available memory such that 
other sorts in the same task do not have enough to proceed. This manifests 
itself in a `Unable to acquire X bytes of memory` in `UnsafeExternalSorter`.
    
    The solution is to reserve a page in each sorter in the chain before 
computing the child operator's (parent RDD's) partitions. This requires us to 
use a new special RDD that does some preparation before computing the parent's 
partitions.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark unsafe-starve-memory

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8011
    
----
commit b889e08d240c099c18eb01c2565dbac94c0b52fd
Author: Andrew Or <[email protected]>
Date:   2015-08-06T19:04:30Z

    MapPartitionsWithPreparationRDD

commit 234acbde642cd6bfc1ae30eea951eb58f5bd9706
Author: Andrew Or <[email protected]>
Date:   2015-08-06T21:45:47Z

    Reserve a page in sorter when preparing each partition

commit 254032e75d5967ae7af7fc8cd6f9ef691f1c5bde
Author: Andrew Or <[email protected]>
Date:   2015-08-06T22:25:48Z

    Add tests
    
    The MapPartitionsWithPreparationRDDSuite simulates the condition
    we are trying to fix, which is that the child can acquire memory
    before the parent.

commit fdd3c92be8cef2f0ad1536676dca94fc6608fd82
Author: Andrew Or <[email protected]>
Date:   2015-08-06T22:26:46Z

    Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to