cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-521637405
@vanzin I checked the code:
https://github.com/apache/spark/blob/1b416a0c77706ba352b
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-521499643
@vanzin I think your concern is valid. Seems the shuffle writing policy is
contradictor
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-520497403
After another look, I think speculative task is OK. When we run an
indeterminate shuffl
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-519329319
The basic assumption of the speculative task is that: the task output is
deterministic
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-519124459
BTW, another way to fix this problem is: always include the task id (not
task attempt i
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-519042612
@squito the problem we need to solve is
1. Spark may need to re-generate some shuffle
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-503389041
Hi @xuanyuanking thanks for your great work! can you extend the PR
description to expla