Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1561#issuecomment-50200148
  
    A lot of the methods and data structures in Stage are specific to only 
ShuffleMapStage or ResultStage.  For example, we only track output locations 
for ShuffleMapStages and only have ActiveJobs for ResultStages.  What do you 
think about splitting Stage into a trait and two subclasses?  This could 
improve the understandability of other DAGScheduler data structures; for 
example, we have
    
    ```scala
    shuffleToMapStage = new HashMap[Int, Stage]
    ```
    
    which only holds ShuffleMapStages, so we could give it a more specific type 
of `HashMap[Int, ShuffleMapStage]`.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to