[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-28 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/17088 >> Also, does the issue here only arise when the shuffle service is enabled? That is correct. For case, when shuffle service is not enabled, this change should be a no-op. --- If your

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-28 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/17088 Can you update the JIRA and PR description to say "un-register the output locations" (or similar) instead of "remove the files"? The current description is misleading since nothing is

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/17088 Even if I completely agreed that removing all of the shuffle files on a host was the correct design choice, I'd still be hesitant to merge this right now. That is simply because we have

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/17088 >> This is quite drastic for a fetch failure : spark already has mechanisms in place to detect executor/host failure - which take care of these failure modes. Unfortunately, mechanisms

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread ConeyLiu
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17088 I agree with @mridulm, file fetch failure does not imply the executor down or all the executor of the host down. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/17088 >> fetch failure does not imply lost executor - it could be a transient issue. Similarly, executor loss does not imply host loss. You are right, it could be transient, but we do have

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17088 fetch failure does not imply lost executor - it could be a transient issue. Similarly, executor loss does not imply host loss. This is quite drastic for a fetch failure : spark already

[GitHub] spark issue #17088: [SPARK-19753][CORE] All shuffle files on a host should b...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17088 **[Test build #73533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73533/testReport)** for PR 17088 at commit