Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/8269#discussion_r37840487
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -741,18 +742,18 @@ class DAGScheduler(
val stageInfos = stageIds.flatMap(id =>
stageIdToStage.get(id).map(_.latestInfo))
listenerBus.post(
SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos,
properties))
- submitStage(finalStage)
+ submitStage(finalStage, Some(missingStages))
}
submitWaitingStages()
}
/** Submits stage, but first recursively submits any missing parents. */
- private def submitStage(stage: Stage) {
+ private def submitStage(stage: Stage, missingStages: Option[List[Stage]]
= None) {
--- End diff --
There are a number of recursive calls that then don't get this value then
and recompute anyway? To avoid complication, is it simpler to just have all
callers provide this info? I don't think it's but one more change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]