attilapiros commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r524349956



##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1854,6 +1889,11 @@ private[spark] class DAGScheduler(
               // host, including from those that we still haven't confirmed as 
lost due to heartbeat
               // delays.
               ignoreShuffleFileLostEpoch = isHostDecommissioned)
+
+            if (pushBasedShuffleEnabled) {
+              // Remove fetchFailed host in the shuffle push merger list for 
push based shuffle
+              
env.blockManager.master.removeShufflePushMergerLocation(bmAddress.host)
+            }

Review comment:
       I think this would be better to be moved to 
`removeExecutorAndUnregisterOutputs` method right before the line `        
blockManagerMaster.removeExecutor(execId)` as there it will be protected from 
reprocessing follow-up fetch failures. For details please see the the epoch 
checks. 
   
   And you can also use `blockManagerMaster` instead of 
`env.blockManager.master`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to