squito commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-520482491 Sorry I don't understand -- why don't you want to support speculative execution? Correctness before performance, yes. On the assumptions about task independence -- yes, that *was* the assumption, before this whole thread of issues related to non-determinstic tasks and stage retry. Fetch failures are more likely on large clusters & large workloads, precisely where speculative execution is important too. If we couldn't get it to work together, then I would totally agree we should go for correctness. But I think using the global TID would give us the behavior we want. I also think the global TID is simpler. For one, debugging is simpler -- the shuffle id is actually the shuffle id; there isn't some other state that is tracked separately to know which block to get. If you're really just concerned about the overhead of an additional field in the shuffle block, I think you could even swap out the original map partition id for the TID of the map task (though that would be more complex in other ways, the `MapStatus` would need to track the TID since its position in the array would no longer be sufficient).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org