attilapiros commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r524349956
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1854,6 +1889,11 @@ private[spark] class DAGScheduler(
// host, including from those that we still haven't confirmed as
lost due to heartbeat
// delays.
ignoreShuffleFileLostEpoch = isHostDecommissioned)
+
+ if (pushBasedShuffleEnabled) {
+ // Remove fetchFailed host in the shuffle push merger list for
push based shuffle
+
env.blockManager.master.removeShufflePushMergerLocation(bmAddress.host)
+ }
Review comment:
I think this would be better to be moved to
`removeExecutorAndUnregisterOutputs` method right before the line `
blockManagerMaster.removeExecutor(execId)` as there it will be protected from
reprocessing follow-up fetch failures. For details please see the the epoch
checks.
And you can also use `blockManagerMaster` instead of
`env.blockManager.master`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]