[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-15 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-521637405 @vanzin I checked the code:

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-14 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-521499643 @vanzin I think your concern is valid. Seems the shuffle writing policy is

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-12 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-520497403 After another look, I think speculative task is OK. When we run an indeterminate

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-07 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-519329319 The basic assumption of the speculative task is that: the task output is deterministic

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-07 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-519124459 BTW, another way to fix this problem is: always include the task id (not task attempt

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-07 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-519042612 @squito the problem we need to solve is 1. Spark may need to re-generate some

[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-06-18 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-503389041 Hi @xuanyuanking thanks for your great work! can you extend the PR description to