weixiuli opened a new pull request #30716:
URL: https://github.com/apache/spark/pull/30716
### What changes were proposed in this pull request?
Avoid calling unregisterMapOutput when the map stage is being rerunning.
### Why are the changes needed?
When a fetch failure happened, DAGScheduler will try to unregister the
corresponding map output. The current logic has a race condition that the new
map stage attempt is running while the current reduce stage attempt returns
another fetch failure (note: the current reduce stage firstly returns a fetch
failure to make the maps stage is rerunning, and then the rerunning map stage
may return some mapstatus of the failed MapId before the current reduce stage
returns another fetch failure at the same MapId, the current reduce is last
attempt due to the new map stage is not yet completed). In this case, if the
map output is always unregistered, it may actually unregister the map output
from the new map stage attempt.
### Does this PR introduce _any_ user-facing change?
No. It is a bug fix.
### How was this patch tested?
Add new uts
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]