Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1561#issuecomment-50200148
A lot of the methods and data structures in Stage are specific to only
ShuffleMapStage or ResultStage. For example, we only track output locations
for ShuffleMapStages and only have ActiveJobs for ResultStages. What do you
think about splitting Stage into a trait and two subclasses? This could
improve the understandability of other DAGScheduler data structures; for
example, we have
```scala
shuffleToMapStage = new HashMap[Int, Stage]
```
which only holds ShuffleMapStages, so we could give it a more specific type
of `HashMap[Int, ShuffleMapStage]`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---