cloud-fan commented on code in PR #53782:
URL: https://github.com/apache/spark/pull/53782#discussion_r2699541616
##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2312,19 +2313,20 @@ private[spark] class DAGScheduler(
failedStages += failedStage
failedStages += mapStage
if (noResubmitEnqueued) {
- // If the map stage is INDETERMINATE, which means the map tasks
may return
- // different result when re-try, we need to re-try all the tasks
of the failed
- // stage and its succeeding stages, because the input data will
be changed after the
- // map tasks are re-tried.
- // Note that, if map stage is UNORDERED, we are fine. The
shuffle partitioner is
- // guaranteed to be determinate, so the input data of the
reducers will not change
- // even if the map tasks are re-tried.
- if (mapStage.isIndeterminate &&
!mapStage.shuffleDep.checksumMismatchFullRetryEnabled) {
- val stagesToRollback = collectSucceedingStages(mapStage)
- val stagesCanRollback =
filterAndAbortUnrollbackableStages(stagesToRollback)
- logInfo(log"The shuffle map stage ${MDC(STAGE, mapStage)} with
indeterminate output " +
- log"was failed, we will roll back and rerun below stages
which include itself and all " +
- log"its indeterminate child stages: ${MDC(STAGES,
stagesCanRollback)}")
+ // For statically indeterminate stages, trigger rollback early
(here and in
+ // submitMissingTasks) rather than deferring to task completion.
This is more
+ // efficient because it clears shuffle outputs before the retry
is submitted,
+ // ensuring findMissingPartitions() returns all partitions.
+ //
+ // For runtime detection (checksum mismatch), rollback is
triggered at task
+ // completion when the mismatch is discovered.
+ //
+ // The `rollbackCurrentStage = true` parameter ensures the
failed map stage is
Review Comment:
I think this comment is clear enough, shall we also use it to replace the
long one in
https://github.com/apache/spark/pull/53782/files#diff-85de35b2e85646ed499c545a3be1cd3ffd525a88aae835a9c621f877eebadcb6R1570
?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]