[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 It depends on the application. It's the amount of time you have to wait before having the opportunity to use those resources again. But if you explicitly revive, which we do here whenever we

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread skonto
Github user skonto commented on the issue: https://github.com/apache/spark/pull/17031 Ok I see. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17031 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73531/ Test PASSed. ---

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17031 **[Test build #73531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73531/testReport)** for PR 17031 at commit

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 @skonto @susanxhuynh I've updated the solution to use a longer (120s) default refuse timeout, instead of suppressing offers. Please re-review. Just as the previous refuse seconds settings were

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17031 **[Test build #73531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73531/testReport)** for PR 17031 at commit

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 @skonto Cassandra supports suppress/revive

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread skonto
Github user skonto commented on the issue: https://github.com/apache/spark/pull/17031 Ok like the Cassandra case you mean right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 Given the concerns about the dispatcher being stuck in a suppressed state, I'm going to solve this a different way. I'm going to increase the default offer decline timeout to 120s and make it

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread skonto
Github user skonto commented on the issue: https://github.com/apache/spark/pull/17031 "The only way to fix this generally is to implement some periodic timer that calls reviveOffers() if there are queued/pending drivers to be scheduled. This can be chatty and complicates the code, so

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-27 Thread skonto
Github user skonto commented on the issue: https://github.com/apache/spark/pull/17031 @mgummelt Yes they should look at the logs but how do they know this is something that requires action from their side and not a cluster issue or anything else. It should be documented since it is

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-24 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 @susanxhuynh I don't think it's worth documenting. It should be clear in the logs, which should be where an operator turns if they notice no jobs are launching. --- If your project is set up

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-24 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 @susanxhuynh Mesos/Spark integration tests: https://github.com/typesafehub/mesos-spark-integration-tests. We run them as a subset of DC/OS Spark integration tests:

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-24 Thread susanxhuynh
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/17031 If we're concerned about the lost reviveOffer() and don't want to handle that corner case, do we want to document it somewhere for operators? "If jobs aren't running and you see [...] in the

[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...

2017-02-24 Thread susanxhuynh
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/17031 The suppress / revive logic LGTM. I didn't look that closely at the refactoring changes. Where are the Mesos/Spark integration tests that you mentioned? @mgummelt --- If your project is set