Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17088
>> Also, does the issue here only arise when the shuffle service is enabled?
That is correct. For case, when shuffle service is not enabled, this change
should be a no-op.
---
If your
Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/17088
Can you update the JIRA and PR description to say "un-register the output
locations" (or similar) instead of "remove the files"? The current description
is misleading since nothing is
Github user markhamstra commented on the issue:
https://github.com/apache/spark/pull/17088
Even if I completely agreed that removing all of the shuffle files on a
host was the correct design choice, I'd still be hesitant to merge this right
now. That is simply because we have
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17088
>> This is quite drastic for a fetch failure : spark already has mechanisms
in place to detect executor/host failure - which take care of these failure
modes.
Unfortunately, mechanisms
Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/17088
I agree with @mridulm, file fetch failure does not imply the executor down
or all the executor of the host down.
---
If your project is set up for it, you can reply to this email and have your
Github user sitalkedia commented on the issue:
https://github.com/apache/spark/pull/17088
>> fetch failure does not imply lost executor - it could be a transient
issue.
Similarly, executor loss does not imply host loss.
You are right, it could be transient, but we do have
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/17088
fetch failure does not imply lost executor - it could be a transient issue.
Similarly, executor loss does not imply host loss.
This is quite drastic for a fetch failure : spark already
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17088
**[Test build #73533 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73533/testReport)**
for PR 17088 at commit