[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

GitBox Mon, 12 Aug 2019 09:29:57 -0700

cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a 
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-520497403
 
 
   After another look, I think speculative task is OK. When we run an 
indeterminate shuffle map stage, it's always a fresh run (either the first run, 
or a retry that reruns all the downstream stages). Sorry about missing it 
before.
   
   It's fine to write shuffle files with speculative tasks. The shuffle map 
task writes to a temp file first, and then try to rename the temp file to the 
formal shuffle file name(`shuffleId-mapId-reduceId`). If a file with the formal 
shuffle file name already exists, give up and delete the temp file.
   
   I think it's a good idea to use TID instead of partition ID to represent 
mapId. There is no more file name conflict anymore. We can keep the shuffle 
protocol unchanged, but there will be a little overhead in `ShuffleStatus`, 
which I think is acceptable.
   
   One concern is, it will be hard to test. Now we need to query 
`MapOutputTracker` to get mapId, instead of writing mapId(0, 1, 2, 3, ...) 
directly in test.
   
   I think this worth a discussion, cc @vanzin @tgravescs @jiangxb1987


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

Reply via email to