yabola commented on code in PR #49270:
URL: https://github.com/apache/spark/pull/49270#discussion_r1896703317
##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2937,7 +2937,9 @@ private[spark] class DAGScheduler(
} else {
// This stage is only used by the job, so finish the stage if it is
running.
val stage = stageIdToStage(stageId)
- if (runningStages.contains(stage)) {
+ val isRunningStage = runningStages.contains(stage) ||
+ (waitingStages.contains(stage) &&
taskScheduler.hasRunningTasks(stageId))
Review Comment:
> what if we just kill all waiting stages? Does
taskScheduler.killAllTaskAttempts handle it well?
I tested that: if the normally generated waiting stages call
`killAllTaskAttempts`, the stage status will be displayed as `FAILED`, which
was `SKIPPED` before, killAllTaskAttempts itself will not go wrong.
Always kill waiting stage seems to be a safer approach (tasks shouldn't be
run anymore), but it may generate unnecessary `stageFailed` events compared
with before.
> and can we add a special flag to indicate the waiting stages that are
submitted due to retry?
Yes, we can add a flag , please see the update codes.
Actually , there is a trade-off here to kill waiting stages
- always kill
- kill stages had failed ( `stage#failedAttemptIds` > 0)
- kill stages failed when fetch failed (`stage#resubmitInFetchFailed`)
- kill stages only have running tasks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]