ivoson commented on code in PR #53782:
URL: https://github.com/apache/spark/pull/53782#discussion_r2702983588


##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2312,19 +2313,20 @@ private[spark] class DAGScheduler(
             failedStages += failedStage
             failedStages += mapStage
             if (noResubmitEnqueued) {
-              // If the map stage is INDETERMINATE, which means the map tasks 
may return
-              // different result when re-try, we need to re-try all the tasks 
of the failed
-              // stage and its succeeding stages, because the input data will 
be changed after the
-              // map tasks are re-tried.
-              // Note that, if map stage is UNORDERED, we are fine. The 
shuffle partitioner is
-              // guaranteed to be determinate, so the input data of the 
reducers will not change
-              // even if the map tasks are re-tried.
-              if (mapStage.isIndeterminate && 
!mapStage.shuffleDep.checksumMismatchFullRetryEnabled) {
-                val stagesToRollback = collectSucceedingStages(mapStage)
-                val stagesCanRollback = 
filterAndAbortUnrollbackableStages(stagesToRollback)
-                logInfo(log"The shuffle map stage ${MDC(STAGE, mapStage)} with 
indeterminate output " +
-                  log"was failed, we will roll back and rerun below stages 
which include itself and all " +
-                  log"its indeterminate child stages: ${MDC(STAGES, 
stagesCanRollback)}")
+              // For statically indeterminate stages, trigger rollback early 
(here and in
+              // submitMissingTasks) rather than deferring to task completion. 
This is more
+              // efficient because it clears shuffle outputs before the retry 
is submitted,
+              // ensuring findMissingPartitions() returns all partitions.
+              //
+              // For runtime detection (checksum mismatch), rollback is 
triggered at task
+              // completion when the mismatch is discovered.
+              //
+              // The `rollbackCurrentStage = true` parameter ensures the 
failed map stage is

Review Comment:
   Sounds good. Updated, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to