weixiuli opened a new pull request #30716:
URL: https://github.com/apache/spark/pull/30716


   ### What changes were proposed in this pull request?
    Avoid calling unregisterMapOutput when the map stage is being rerunning.
   
   ### Why are the changes needed?
   
   When a fetch failure happened, DAGScheduler will try to unregister the 
corresponding map output. The current logic has a race condition that the new 
map stage attempt is running while the current reduce stage attempt returns 
another fetch failure (note: the current reduce stage firstly returns a fetch 
failure to make the maps stage is rerunning, and then the rerunning map stage 
may return some mapstatus of the failed MapId before the current reduce stage 
returns another fetch failure at the same MapId, the current reduce is last 
attempt due to the new map stage is not yet completed). In this case, if the 
map output is always unregistered, it may actually unregister the map output 
from the new map stage attempt.
   ### Does this PR introduce _any_ user-facing change?
   No. It is a bug fix.
   
   ### How was this patch tested?
   Add new uts
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to