Stove-hust opened a new pull request, #40393: URL: https://github.com/apache/spark/pull/40393
### What changes were proposed in this pull request? Copy the logic of handleTaskCompletion in DAGScheduler for processing the last shuffleMapTask into submitMissingTasks. ### Why are the changes needed? In condition of push-based shuffle being enabled and speculative tasks existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, then its parent stages will be resubmitting firstly and it will cost some time to compute. Before the shuffleMapStage being resubmitted, its all speculative tasks success and register map output, but speculative task successful events can not trigger shuffleMergeFinalized( shuffleBlockPusher.notifyDriverAboutPushCompletion ) because this stage has been removed from runningStages. Then this stage is resubmitted, but speculative tasks have registered map output and there are no missing tasks to compute, resubmitting stages will also not trigger shuffleMergeFinalized. Eventually this stage‘s _shuffleMergedFinalized keeps false. Then AQE will submit next stages which are dependent on this shuffleMapStage occurring fetchFailed. And in getMissingParentStages, this stage will be marked as missing and will be resubmitted, but next stages are added to waitingStages after this stage being finished, so next stages will not be submitted even though this stage's resubmitting has been finished. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This extreme case is very difficult to construct, and we added logs to our production environment to capture the number of problems and verify the stability of the job. I am happy to provide a timeline of the various events in which the problem arose。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org