Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/12060#discussion_r58169664
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -750,23 +764,20 @@ class DAGScheduler(
submitStage(stage)
}
}
- submitWaitingStages()
}
/**
* Check for waiting stages which are now eligible for resubmission.
- * Ordinarily run on every iteration of the event loop.
+ * Ordinarily run after the parent stage completed successfully.
*/
- private def submitWaitingStages() {
- // TODO: We might want to run this less often, when we are sure that
something has become
- // runnable that wasn't before.
+ private def submitWaitingChildStages(parent: Stage) {
logTrace("Checking for newly runnable parent stages")
logTrace("running: " + runningStages)
logTrace("waiting: " + waitingStages)
logTrace("failed: " + failedStages)
- val waitingStagesCopy = waitingStages.toArray
- waitingStages.clear()
- for (stage <- waitingStagesCopy.sortBy(_.firstJobId)) {
+ val childStages =
waitingStages.filter(_.parents.contains(parent)).toArray
+ waitingStages --= childStages
+ for (stage <- childStages.sortBy(_.firstJobId)) {
submitStage(stage)
--- End diff --
Seems `submitWaitingChildStages ` is called to submit child stages when
the given `parent` stage is available. From this observation, do we have to
re-check missing parents inside `submitStage`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]