GitHub user ilganeli opened a pull request: https://github.com/apache/spark/pull/4703
[SPARK-4655] Split Stage into ShuffleMapStage and ResultStage subclasses Hi all - this patch includes two main efforts: 1) I've split up Stage into ShuffleMapStage and ResultStage and updated their usage within DAGScheduler 2) While doing this, I ended up cleaning up a lot of the long and awkward functions within the DAGScheduler since I needed to do that to make sense of the code. I believe this improved readability of the code significantly - the only problem is that it's not straightforward to diff those changes side by side (particularly the updates to handleFetchFailure(). I wanted to confirm that it's appropriate to move the outputLocs variable and its associated methods into ShuffleMapStage. The one function of concern is within the completeStage() and handleFailedTasks() functions where I wanted to confirm that we are guaranteed for that stage to be a ShuffleMapStage I believe I've split up the functionality of ResultStage and ShuffleMapStage appropriately but I wanted to make sure I'm not missing something. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/spark SPARK-4653 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4703 ---- commit 424fdef9578b4450ec75c8d84db98927e4b15831 Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-18T01:19:20Z Began refactoring Stage.scala commit 562b688d07a9404196c6856b5f3302c31b05bb2d Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T00:54:36Z Significant refactoring within DAG Scheduler class commit 75fe74f616adfa4e2da7a439fc86f784265aae84 Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T00:58:34Z Merge remote-tracking branch 'upstream/master' into SPARK-4653 commit 5e76fb8b21ba877afe4a9255036b8120691aeda8 Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T01:06:23Z Small updates commit cb66a4fdf23883c24a7684b6f20ca532c74241a7 Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T18:08:14Z Further refactoring. Moved outputLocs and associated functions to inside ShuffleMapStage. commit e3e7ea23584d3fb442ede54231aa048815a8f436 Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T18:11:15Z Minor formatting commit 149b9a13e48371a1ea06a3fc85374fa026368cd8 Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T22:06:21Z Refactored more extremely large methods to be more modular. Cleaned up some usages of vs nonEmpty. commit be3e01fad73a3d1124ad090acbef6f38379afc4e Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-19T22:58:56Z Minor fixes commit 00182c7e7f9a38d5eee86065171804acc52e6d2a Author: Ilya Ganelin <ilya.gane...@capitalone.com> Date: 2015-02-20T07:43:59Z Cleaned up hanleTaskCompletion more ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org