Aaron Staple created SPARK-2581: ----------------------------------- Summary: complete or withdraw visitedStages optimization in DAGScheduler’s stageDependsOn Key: SPARK-2581 URL: https://issues.apache.org/jira/browse/SPARK-2581 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Aaron Staple Priority: Minor
Right now the visitedStages HashSet is populated with stages, but never queried to limit examination of previously visited stages. It may make sense to check whether a mapStage has been visited previously before visiting it again, as in the nearby visitedRdds check. Or it may be that the existing visitedRdds check sufficiently optimizes this function, and visitedStages can simply be removed. See discussion here: https://github.com/apache/spark/pull/1362#discussion-diff-15018046L1107 -- This message was sent by Atlassian JIRA (v6.2#6252)