GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/1009
[SPARK-2044] Pluggable interface for shuffles
This is a first cut at moving shuffle logic behind a pluggable interface,
as described at https://issues.apache.org/jira/browse/SPARK-2044, to let us
more easily experiment with new shuffle implementations. It moves the existing
shuffle code to a class HashShuffleManager behind a general ShuffleManager
interface.
Two things are still missing to make this complete:
* MapOutputTracker needs to be hidden behind the ShuffleManager interface;
this will also require adding methods to ShuffleManager that will let the
DAGScheduler interact with it as it does with the MapOutputTracker today
* The code to do map-sides and reduce-side combine in ShuffledRDD,
PairRDDFunctions, etc needs to be moved into the ShuffleManager's readers and
writers
However, some of these may also be done later after we merge the current
interface.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mateiz/spark pluggable-shuffle
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1009.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1009
----
commit 75cc0446d5dc3b605961c0c41d036fab4ee8e9e8
Author: Matei Zaharia <[email protected]>
Date: 2014-06-06T05:09:37Z
Partial work to move hash shuffle in
commit 55c77175dc307030285a541ca2c47f31c05a207a
Author: Matei Zaharia <[email protected]>
Date: 2014-06-06T21:19:33Z
Changed RDD code to use ShuffleReader
commit f6f011d7a76c7ef2c6de20afd9bdeb8007ca56bb
Author: Matei Zaharia <[email protected]>
Date: 2014-06-06T21:56:54Z
Move hash shuffle reader behind ShuffleManager interface
commit 4f681ba7ebee7e8ad0b8cd5f8d692f2aaa910992
Author: Matei Zaharia <[email protected]>
Date: 2014-06-08T05:41:36Z
Move write part of ShuffleMapTask to ShuffleManager
commit ac56831d24e87d8d445eaed30297e4cf3fb73deb
Author: Matei Zaharia <[email protected]>
Date: 2014-06-08T05:53:18Z
Bug fix and better error message
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---