GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/8844
[SPARK-9852] Let reduce tasks fetch multiple map output partitions
This makes two changes:
- Allow reduce tasks to fetch multiple map output partitions -- this is a
pretty small change to HashShuffleFetcher
- Move shuffle locality computation out of DAGScheduler and into
ShuffledRDD / MapOutputTracker; this was needed because the code in
DAGScheduler wouldn't work for RDDs that fetch multiple map output partitions
from each reduce task
I also added an AdaptiveSchedulingSuite that creates RDDs depending on
multiple map output partitions.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mateiz/spark spark-9852
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8844.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8844
----
commit cbf6a5a78c419b8bb26e6259849f1c35a2d31edb
Author: Matei Zaharia <[email protected]>
Date: 2015-08-13T23:35:49Z
Allow HashShuffleReader to fetch multiple partitions
commit 8f42d5c036b9b0985c89cdcf43f66f9f5eec6f3f
Author: Matei Zaharia <[email protected]>
Date: 2015-08-20T22:13:54Z
Compute reduce locality only for ShuffledRDD and its SQL counterpart
commit e4a6f5f547788d03c1e0a373fc2f091571c1d12b
Author: Matei Zaharia <[email protected]>
Date: 2015-08-20T22:23:05Z
More testing
commit 9ab02f17bae711dbd2e3979f8e64863fb84cbd81
Author: Matei Zaharia <[email protected]>
Date: 2015-09-20T16:09:07Z
Fix compile
commit f4d2519bc3467d595dcd293a2b54157664222630
Author: Matei Zaharia <[email protected]>
Date: 2015-09-20T22:24:34Z
Some test fixes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]