Stove-hust opened a new pull request, #40393:
URL: https://github.com/apache/spark/pull/40393

   ### What changes were proposed in this pull request?
   Copy the logic of handleTaskCompletion in DAGScheduler for processing the 
last shuffleMapTask into submitMissingTasks.
   
   
   ### Why are the changes needed?
   In condition of push-based shuffle being enabled and speculative tasks 
existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, 
then its parent stages will be resubmitting firstly and it will cost some time 
to compute. Before the shuffleMapStage being resubmitted, its all speculative 
tasks success and register map output, but speculative task successful events 
can not trigger shuffleMergeFinalized( 
shuffleBlockPusher.notifyDriverAboutPushCompletion ) because this stage has 
been removed from runningStages.
   
   Then this stage is resubmitted, but speculative tasks have registered map 
output and there are no missing tasks to compute, resubmitting stages will also 
not trigger shuffleMergeFinalized. Eventually this stage‘s 
_shuffleMergedFinalized keeps false.
   
   Then AQE will submit next stages which are dependent on  this 
shuffleMapStage occurring fetchFailed. And in getMissingParentStages, this 
stage will be marked as missing and will be resubmitted, but next stages are 
added to waitingStages after this stage being finished, so next stages will not 
be submitted even though this stage's resubmitting has been finished.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   This extreme case is very difficult to construct, and we added logs to our 
production environment to capture the number of problems and verify the 
stability of the job. I am happy to provide a timeline of the various events in 
which the problem arose。
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to