[GitHub] [spark] mridulm commented on pull request #34735: [SPARK-37481][Core][WebUI] Fix disappearance of skipped stages after they retry

GitBox Tue, 07 Dec 2021 03:50:32 -0800


mridulm commented on pull request #34735:
URL: https://github.com/apache/spark/pull/34735#issuecomment-987853351



   I understand about AQE leveraging shuffle map stage to submit a job.
   My query was around - "How are users inferring the last part ? That stage 0 
was retried due to stage 3 failure ?"
   
   My query was a response to this:
   >Per the figure in PR desc, we will both keep the skipped info and retry 
info, it's clear for us to know that stage 2 once get skipped because stage 0 
has all the map outputs, and gets retried because stage 3 failed with fetch 
failed issues.
   
   The inference that stage 3 failed with fetch failure resulted in stage 2 
getting re-executed is what I want to make sure users can understand - we dont 
expose why stage 2 was initially skipped, and what caused it to be re-executed 
: in the case of skipped stages specifically.
   
   But thinking more, I feel this PR might be a strict improvement over the 
current state.
   Atleast users can see what the original (skipped) stage details were, even 
though it was skipped: and is now getting reexecuted with a different set of 
partitions.
   
   +CC @tgravescs 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on pull request #34735: [SPARK-37481][Core][WebUI] Fix disappearance of skipped stages after they retry

Reply via email to