Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/17031
It depends on the application. It's the amount of time you have to wait
before having the opportunity to use those resources again. But if you
explicitly revive, which we do here whenever we
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/17031
Ok I see. LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17031
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17031
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73531/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17031
**[Test build #73531 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73531/testReport)**
for PR 17031 at commit
Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/17031
@skonto @susanxhuynh I've updated the solution to use a longer (120s)
default refuse timeout, instead of suppressing offers. Please re-review. Just
as the previous refuse seconds settings were
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17031
**[Test build #73531 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73531/testReport)**
for PR 17031 at commit
Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/17031
@skonto Cassandra supports suppress/revive
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/17031
Ok like the Cassandra case you mean right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/17031
Given the concerns about the dispatcher being stuck in a suppressed state,
I'm going to solve this a different way. I'm going to increase the default
offer decline timeout to 120s and make it
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/17031
"The only way to fix this generally is to implement some periodic timer
that calls reviveOffers() if there are queued/pending drivers to be scheduled.
This can be chatty and complicates the code, so
Github user skonto commented on the issue:
https://github.com/apache/spark/pull/17031
@mgummelt Yes they should look at the logs but how do they know this is
something that requires action from their side and not a cluster issue or
anything else. It should be documented since it is
Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/17031
@susanxhuynh I don't think it's worth documenting. It should be clear in
the logs, which should be where an operator turns if they notice no jobs are
launching.
---
If your project is set up
Github user mgummelt commented on the issue:
https://github.com/apache/spark/pull/17031
@susanxhuynh Mesos/Spark integration tests:
https://github.com/typesafehub/mesos-spark-integration-tests. We run them as a
subset of DC/OS Spark integration tests:
Github user susanxhuynh commented on the issue:
https://github.com/apache/spark/pull/17031
If we're concerned about the lost reviveOffer() and don't want to handle
that corner case, do we want to document it somewhere for operators? "If jobs
aren't running and you see [...] in the
Github user susanxhuynh commented on the issue:
https://github.com/apache/spark/pull/17031
The suppress / revive logic LGTM. I didn't look that closely at the
refactoring changes. Where are the Mesos/Spark integration tests that you
mentioned? @mgummelt
---
If your project is set
16 matches
Mail list logo