xuanyuanking edited a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-526215837 Great thanks @squito for the idea of reusing the task attempt id as map id, this significantly reduces the code changes. I reimplement the task in #25620. Beside of the tests changes and map status should add the map task attempt id, I found maybe the last overhead during this work, it's about the SortShuffleManager, we need to record all the map task id while only keep the map numbers before, let's discuss this [here](https://github.com/apache/spark/pull/25620/files#diff-f0a98bdcfed7b93ab277e2b92c8fd9ecR86). ``` Oh, one more thing: changing from partition ID to task ID in ShuffleBlockId & friends would still qualify as a change in the shuffle service protocol, since there's a type change from int to long, and a lot of code in the shuffle service assumes that the id will be an integer. ``` Thanks @vanzin for the reminding, the compatibility for external shuffle service is definitely an important consideration. We'll only do this extension for the new shuffle protocol, thanks for the work in #24565, we can compatible with old external shuffle service by using the old protocol, you can see the corresponding implement [here](https://github.com/apache/spark/pull/25620/files#diff-6a9ff7fb74fd490a50462d45db2d5e11R1619).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
