Re: [PR] [SPARK-53575][CORE] Retry entire consumer stages when checksum mismatch detected for a retried shuffle map task [spark]

via GitHub Tue, 09 Dec 2025 00:58:49 -0800


ivoson commented on code in PR #52336:
URL: https://github.com/apache/spark/pull/52336#discussion_r2601685799



##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2022,8 +2053,26 @@ private[spark] class DAGScheduler(
                 // The epoch of the task is acceptable (i.e., the task was 
launched after the most
                 // recent failure we're aware of for the executor), so mark 
the task's output as
                 // available.
-                mapOutputTracker.registerMapOutput(
+                val isChecksumMismatched = mapOutputTracker.registerMapOutput(
                   shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
+                if (isChecksumMismatched) {
+                  shuffleStage.isChecksumMismatched = isChecksumMismatched

Review Comment:
   Hi @mridulm , this is not set back to `false`. Would expect all the 
succeeding stages do fully retry once there is checksum mismatch happening for 
the stage, as we don't know the successful tasks consumed which version shuffle 
output.
   
   This won't fail the app, the impact is that the succeeding stages would have 
a fully-retry.
   
   The code logic has changed a little bit in PR: 
https://github.com/apache/spark/pull/53274 
   
   Pls take a look once you get a change. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53575][CORE] Retry entire consumer stages when checksum mismatch detected for a retried shuffle map task [spark]

Reply via email to