GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/8011
[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort
The issue is that a task may run multiple sorts, and the sorts run by the
child operator (i.e. parent RDD) may acquire all available memory such that
other sorts in the same task do not have enough to proceed. This manifests
itself in a `Unable to acquire X bytes of memory` in `UnsafeExternalSorter`.
The solution is to reserve a page in each sorter in the chain before
computing the child operator's (parent RDD's) partitions. This requires us to
use a new special RDD that does some preparation before computing the parent's
partitions.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark unsafe-starve-memory
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8011.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8011
----
commit b889e08d240c099c18eb01c2565dbac94c0b52fd
Author: Andrew Or <[email protected]>
Date: 2015-08-06T19:04:30Z
MapPartitionsWithPreparationRDD
commit 234acbde642cd6bfc1ae30eea951eb58f5bd9706
Author: Andrew Or <[email protected]>
Date: 2015-08-06T21:45:47Z
Reserve a page in sorter when preparing each partition
commit 254032e75d5967ae7af7fc8cd6f9ef691f1c5bde
Author: Andrew Or <[email protected]>
Date: 2015-08-06T22:25:48Z
Add tests
The MapPartitionsWithPreparationRDDSuite simulates the condition
we are trying to fix, which is that the child can acquire memory
before the parent.
commit fdd3c92be8cef2f0ad1536676dca94fc6608fd82
Author: Andrew Or <[email protected]>
Date: 2015-08-06T22:26:46Z
Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]