cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-520497403 After another look, I think speculative task is OK. When we run an indeterminate shuffle map stage, it's always a fresh run (either the first run, or a retry that reruns all the downstream stages). Sorry about missing it before. It's fine to write shuffle files with speculative tasks. The shuffle map task writes to a temp file first, and then try to rename the temp file to the formal shuffle file name(`shuffleId-mapId-reduceId`). If a file with the formal shuffle file name already exists, give up and delete the temp file. I think it's a good idea to use TID instead of partition ID to represent mapId. There is no more file name conflict anymore. We can keep the shuffle protocol unchanged, but there will be a little overhead in `ShuffleStatus`, which I think is acceptable. One concern is, it will be hard to test. Now we need to query `MapOutputTracker` to get mapId, instead of writing mapId(0, 1, 2, 3, ...) directly in test. I think this worth a discussion, cc @vanzin @tgravescs @jiangxb1987
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org