GitHub user mgummelt opened a pull request:

    https://github.com/apache/spark/pull/17031

    [SPARK-19702] Add suppress/revive support to the Mesos Spark Dispatcher

    ## What changes were proposed in this pull request?
    
    Adds suppress/revive support to the Mesos Spark Dispatcher to prevent 
starving other frameworks.  See JIRA for details.  The majority of the lines 
changed in this PR are superficial refactoring to fix up the 
`MesosClusterScheduler` class, which was rife with poor naming and code 
organization.  The meat of the changes are pointed out in the comments.
    
    The Dispatcher should be suppressed when there are no drivers queued nor 
pending retry.  Whenever the queues defining these two sets are modified, we 
may potentially need to call `suppressOffers()` or `reviveOffers()`.  We only 
do so if we aren't already suppressed or revived, respectively.  Strictly 
speaking, we can never know if we are suppressed or revived, because remote 
driver calls don't guarantee delivery.  In the low probability event that a 
revive call fails, the scheduler may think it's revived, when really it's 
suppressed.  This could result in starvation.  The operator would have to 
manually restart the dispatcher, at which time the dispatcher would again call 
`reviveOffers()`.  The only way to fix this generally is to implement some 
periodic timer that calls `reviveOffers()` if there are queued/pending drivers 
to be scheduled.  This can be chatty and complicates the code, so I haven't 
implemented it here.
    
    ## How was this patch tested?
    
    Unit tests, Manual testing, and Mesos/Spark integration test suite
    
    cc @susanxhuynh @skonto @jmlvanre


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mesosphere/spark SPARK-19702-suppress-revive

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17031
    
----
commit 98604f833e055cdac13506d9122caad8a5ff0a89
Author: Michael Gummelt <[email protected]>
Date:   2017-02-22T23:30:39Z

    Add suppress/revive support to the Mesos Spark Dispatcher

commit a16a4297131f1d4529569509b597e3178ad60d93
Author: Michael Gummelt <[email protected]>
Date:   2017-02-23T00:29:49Z

    add tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to