Ngone51 commented on code in PR #43954:
URL: https://github.com/apache/spark/pull/43954#discussion_r1408649384
##########
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala:
##########
@@ -296,18 +296,32 @@ private[spark] class TaskSchedulerImpl(
new TaskSetManager(this, taskSet, maxTaskFailures, healthTrackerOpt, clock)
}
+ // Kill all the tasks in all the stage attempts of the same stage Id. Note
stage attempts won't
+ // be aborted but will be marked as zombie. The stage attempt will be
finished and cleaned up
+ // once all the tasks has been finished. The stage attempt could be aborted
after the call of
Review Comment:
```
def abort(message: String, exception: Option[Throwable] = None): Unit =
sched.synchronized {
sched.dagScheduler.taskSetFailed(taskSet, message, exception)
isZombie = true
maybeFinishTaskSet()
}
```
When there is a call to abort, the TSM must be marked as zombie. So the key
difference should come from `dagScheduler.taskSetFailed`.
`dagScheduler.taskSetFailed` essentially cleans up the data related to this
stage and fail the jobs which depends on this stage.
There's no difference to TSM between zombie and abort. Tasks in TSM can
still run until finishes (whether killed or succeeded).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]