mridulm commented on pull request #34735: URL: https://github.com/apache/spark/pull/34735#issuecomment-987853351
I understand about AQE leveraging shuffle map stage to submit a job. My query was around - "How are users inferring the last part ? That stage 0 was retried due to stage 3 failure ?" My query was a response to this: >Per the figure in PR desc, we will both keep the skipped info and retry info, it's clear for us to know that stage 2 once get skipped because stage 0 has all the map outputs, and gets retried because stage 3 failed with fetch failed issues. The inference that stage 3 failed with fetch failure resulted in stage 2 getting re-executed is what I want to make sure users can understand - we dont expose why stage 2 was initially skipped, and what caused it to be re-executed : in the case of skipped stages specifically. But thinking more, I feel this PR might be a strict improvement over the current state. Atleast users can see what the original (skipped) stage details were, even though it was skipped: and is now getting reexecuted with a different set of partitions. +CC @tgravescs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
