GitHub user mateiz opened a pull request:

    https://github.com/apache/spark/pull/1009

    [SPARK-2044] Pluggable interface for shuffles

    This is a first cut at moving shuffle logic behind a pluggable interface, 
as described at https://issues.apache.org/jira/browse/SPARK-2044, to let us 
more easily experiment with new shuffle implementations. It moves the existing 
shuffle code to a class HashShuffleManager behind a general ShuffleManager 
interface.
    
    Two things are still missing to make this complete:
    * MapOutputTracker needs to be hidden behind the ShuffleManager interface; 
this will also require adding methods to ShuffleManager that will let the 
DAGScheduler interact with it as it does with the MapOutputTracker today
    * The code to do map-sides and reduce-side combine in ShuffledRDD, 
PairRDDFunctions, etc needs to be moved into the ShuffleManager's readers and 
writers
    
    However, some of these may also be done later after we merge the current 
interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mateiz/spark pluggable-shuffle

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1009
    
----
commit 75cc0446d5dc3b605961c0c41d036fab4ee8e9e8
Author: Matei Zaharia <[email protected]>
Date:   2014-06-06T05:09:37Z

    Partial work to move hash shuffle in

commit 55c77175dc307030285a541ca2c47f31c05a207a
Author: Matei Zaharia <[email protected]>
Date:   2014-06-06T21:19:33Z

    Changed RDD code to use ShuffleReader

commit f6f011d7a76c7ef2c6de20afd9bdeb8007ca56bb
Author: Matei Zaharia <[email protected]>
Date:   2014-06-06T21:56:54Z

    Move hash shuffle reader behind ShuffleManager interface

commit 4f681ba7ebee7e8ad0b8cd5f8d692f2aaa910992
Author: Matei Zaharia <[email protected]>
Date:   2014-06-08T05:41:36Z

    Move write part of ShuffleMapTask to ShuffleManager

commit ac56831d24e87d8d445eaed30297e4cf3fb73deb
Author: Matei Zaharia <[email protected]>
Date:   2014-06-08T05:53:18Z

    Bug fix and better error message

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to