GitHub user ilganeli opened a pull request:

    https://github.com/apache/spark/pull/4703

    [SPARK-4655] Split Stage into ShuffleMapStage and ResultStage subclasses

    Hi all - this patch includes two main efforts:
    
    1) I've split up Stage into ShuffleMapStage and ResultStage and updated 
their usage within DAGScheduler
    2) While doing this, I ended up cleaning up a lot of the long and awkward 
functions within the DAGScheduler since I needed to do that to make sense of 
the code. I believe this improved readability of the code significantly - the 
only problem is that it's not straightforward to diff those changes side by 
side (particularly the updates to handleFetchFailure(). 
    
    I wanted to confirm that it's appropriate to move the outputLocs variable 
and its associated methods into ShuffleMapStage. The one function of concern is 
within the completeStage() and handleFailedTasks() functions where I wanted to 
confirm that we are guaranteed for that stage to be a ShuffleMapStage
    
    I believe I've split up the functionality of ResultStage and 
ShuffleMapStage appropriately but I wanted to make sure I'm not missing 
something.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilganeli/spark SPARK-4653

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4703
    
----
commit 424fdef9578b4450ec75c8d84db98927e4b15831
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-18T01:19:20Z

    Began refactoring Stage.scala

commit 562b688d07a9404196c6856b5f3302c31b05bb2d
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T00:54:36Z

    Significant refactoring within DAG Scheduler class

commit 75fe74f616adfa4e2da7a439fc86f784265aae84
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T00:58:34Z

    Merge remote-tracking branch 'upstream/master' into SPARK-4653

commit 5e76fb8b21ba877afe4a9255036b8120691aeda8
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T01:06:23Z

    Small updates

commit cb66a4fdf23883c24a7684b6f20ca532c74241a7
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T18:08:14Z

    Further refactoring. Moved outputLocs and associated functions to inside 
ShuffleMapStage.

commit e3e7ea23584d3fb442ede54231aa048815a8f436
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T18:11:15Z

    Minor formatting

commit 149b9a13e48371a1ea06a3fc85374fa026368cd8
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T22:06:21Z

    Refactored more extremely large methods to be more modular. Cleaned up some 
usages of vs nonEmpty.

commit be3e01fad73a3d1124ad090acbef6f38379afc4e
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-19T22:58:56Z

    Minor fixes

commit 00182c7e7f9a38d5eee86065171804acc52e6d2a
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2015-02-20T07:43:59Z

    Cleaned up hanleTaskCompletion more

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to