cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-519124459 BTW, another way to fix this problem is: always include the task id (not task attempt id) in the shuffle block id. This works, with a larger overhead: 1. the `MapOutputTracker` needs to track the task id per shuffle block, instead of a shuffle generation id per shuffle. 2. when the shuffle reader fetching blocks of one shuffle, it needs to include one task id per shuffle block in the network request. 3. even if there is no indeterminate stage, the overhead is still there.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org