GitHub user ilganeli opened a pull request:
https://github.com/apache/spark/pull/4703
[SPARK-4655] Split Stage into ShuffleMapStage and ResultStage subclasses
Hi all - this patch includes two main efforts:
1) I've split up Stage into ShuffleMapStage and ResultStage and updated
their usage within DAGScheduler
2) While doing this, I ended up cleaning up a lot of the long and awkward
functions within the DAGScheduler since I needed to do that to make sense of
the code. I believe this improved readability of the code significantly - the
only problem is that it's not straightforward to diff those changes side by
side (particularly the updates to handleFetchFailure().
I wanted to confirm that it's appropriate to move the outputLocs variable
and its associated methods into ShuffleMapStage. The one function of concern is
within the completeStage() and handleFailedTasks() functions where I wanted to
confirm that we are guaranteed for that stage to be a ShuffleMapStage
I believe I've split up the functionality of ResultStage and
ShuffleMapStage appropriately but I wanted to make sure I'm not missing
something.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ilganeli/spark SPARK-4653
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4703.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4703
----
commit 424fdef9578b4450ec75c8d84db98927e4b15831
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-18T01:19:20Z
Began refactoring Stage.scala
commit 562b688d07a9404196c6856b5f3302c31b05bb2d
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T00:54:36Z
Significant refactoring within DAG Scheduler class
commit 75fe74f616adfa4e2da7a439fc86f784265aae84
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T00:58:34Z
Merge remote-tracking branch 'upstream/master' into SPARK-4653
commit 5e76fb8b21ba877afe4a9255036b8120691aeda8
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T01:06:23Z
Small updates
commit cb66a4fdf23883c24a7684b6f20ca532c74241a7
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T18:08:14Z
Further refactoring. Moved outputLocs and associated functions to inside
ShuffleMapStage.
commit e3e7ea23584d3fb442ede54231aa048815a8f436
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T18:11:15Z
Minor formatting
commit 149b9a13e48371a1ea06a3fc85374fa026368cd8
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T22:06:21Z
Refactored more extremely large methods to be more modular. Cleaned up some
usages of vs nonEmpty.
commit be3e01fad73a3d1124ad090acbef6f38379afc4e
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-19T22:58:56Z
Minor fixes
commit 00182c7e7f9a38d5eee86065171804acc52e6d2a
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-20T07:43:59Z
Cleaned up hanleTaskCompletion more
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]