ivoson commented on code in PR #52336:
URL: https://github.com/apache/spark/pull/52336#discussion_r2601685799
##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2022,8 +2053,26 @@ private[spark] class DAGScheduler(
// The epoch of the task is acceptable (i.e., the task was
launched after the most
// recent failure we're aware of for the executor), so mark
the task's output as
// available.
- mapOutputTracker.registerMapOutput(
+ val isChecksumMismatched = mapOutputTracker.registerMapOutput(
shuffleStage.shuffleDep.shuffleId, smt.partitionId, status)
+ if (isChecksumMismatched) {
+ shuffleStage.isChecksumMismatched = isChecksumMismatched
Review Comment:
Hi @mridulm , this is not set back to `false`. Would expect all the
succeeding stages do fully retry once there is checksum mismatch happening for
the stage, as we don't know the successful tasks consumed which version shuffle
output.
This won't fail the app, the impact is that the succeeding stages would have
a fully-retry.
The code logic has changed a little bit in PR:
https://github.com/apache/spark/pull/53274
Pls take a look once you get a change. Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]