xuanyuanking commented on issue #24892: [SPARK-25341][Core] Support rolling 
back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-526215837
 
 
   Great thanks @squito for the idea of reusing the task attempt id as map id, 
this significantly reduces the code changes. I reimplement the task in #25620. 
Beside of the tests changes and map status should add the map task attempt id, 
I found maybe the last overhead during this work, it's about the 
SortShuffleManager, we need to record all the map task id while only keep the 
map numbers before, let's discuss this 
[here](https://github.com/apache/spark/pull/25620/files#diff-f0a98bdcfed7b93ab277e2b92c8fd9ecR86).
   
   ```
   Oh, one more thing: changing from partition ID to task ID in ShuffleBlockId 
& friends would still qualify as a change in the shuffle service protocol, 
since there's a type change from int to long, and a lot of code in the shuffle 
service assumes that the id will be an integer.
   ```
   Thanks @vanzin for the reminding, the compatibility for external shuffle 
service is definitely an important consideration. We'll only do this extension 
for the new shuffle protocol, thanks for the work in #24565, we can compatible 
with old external shuffle service by using the old protocol, you can see the 
corresponding implement 
(here)[https://github.com/apache/spark/pull/25620/files#diff-6a9ff7fb74fd490a50462d45db2d5e11R1619].

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to