GitHub user mgummelt opened a pull request:
https://github.com/apache/spark/pull/17031
[SPARK-19702] Add suppress/revive support to the Mesos Spark Dispatcher
## What changes were proposed in this pull request?
Adds suppress/revive support to the Mesos Spark Dispatcher to prevent
starving other frameworks. See JIRA for details. The majority of the lines
changed in this PR are superficial refactoring to fix up the
`MesosClusterScheduler` class, which was rife with poor naming and code
organization. The meat of the changes are pointed out in the comments.
The Dispatcher should be suppressed when there are no drivers queued nor
pending retry. Whenever the queues defining these two sets are modified, we
may potentially need to call `suppressOffers()` or `reviveOffers()`. We only
do so if we aren't already suppressed or revived, respectively. Strictly
speaking, we can never know if we are suppressed or revived, because remote
driver calls don't guarantee delivery. In the low probability event that a
revive call fails, the scheduler may think it's revived, when really it's
suppressed. This could result in starvation. The operator would have to
manually restart the dispatcher, at which time the dispatcher would again call
`reviveOffers()`. The only way to fix this generally is to implement some
periodic timer that calls `reviveOffers()` if there are queued/pending drivers
to be scheduled. This can be chatty and complicates the code, so I haven't
implemented it here.
## How was this patch tested?
Unit tests, Manual testing, and Mesos/Spark integration test suite
cc @susanxhuynh @skonto @jmlvanre
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mesosphere/spark SPARK-19702-suppress-revive
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17031.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17031
----
commit 98604f833e055cdac13506d9122caad8a5ff0a89
Author: Michael Gummelt <[email protected]>
Date: 2017-02-22T23:30:39Z
Add suppress/revive support to the Mesos Spark Dispatcher
commit a16a4297131f1d4529569509b597e3178ad60d93
Author: Michael Gummelt <[email protected]>
Date: 2017-02-23T00:29:49Z
add tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]