GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/14079

    [SPARK-8425][CORE] New Blacklist Mechanism

    ## What changes were proposed in this pull request?
    
    This adds a blacklisting mechanism for Spark to handle bad executors and 
nodes, eg. for when a disk fails.  Previously there was an undocumented 
blacklisting mechanism, but it was very limited.  This supports blacklisting 
individual tasks on an executor, executors and nodes for an entire stage, and 
executors and nodes across an entire spark application.  Full details are in a 
design doc attached to the jira.
    
    ## How was this patch tested?
    
    Added unit tests, ran them via Jenkins, also ran a handful of them in a 
loop to check for flakiness.
    
    The added tests include:
    * verifying BlacklistTracker works correctly
    * verifying TaskSchedulerImpl interacts with BlacklistTracker correctly 
(via a mock BlacklistTracker)
    * an integration test for the entire scheduler with blacklisting in a few 
different scenarios
    
    TODO
    [ ] More testing on a real cluster
    [ ] Fix & test handling of shuffle-fetch failures
    [ ] Documentation

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark blacklist-SPARK-8425

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14079
    
----
commit 9a6aaede723b206a95a616ae740dcc5ed6b74cbc
Author: mwws <[email protected]>
Date:   2015-12-29T06:01:17Z

    enhance blacklist mechanism
    
    1. create new BlacklistTracker and BlacklistStrategy interface to
    support
    complex use case for blacklist mechanism.
    2. make Yarn allocator aware of node blacklist information
    3. three strategies implemented for convenience, also user can define
    his own strategy
    SingleTaskStrategy: remain default behavior before this change.
    AdvanceSingleTaskStrategy: enhance SingleTaskStrategy by supporting
    stage level node blacklist
    ExecutorAndNodeStrategy: different taskSet can share blacklist
    information.

commit 5bfe94106f8c49f2c3d60d13bd2bb6389be94e4f
Author: Imran Rashid <[email protected]>
Date:   2016-05-10T17:49:05Z

    Update for new design

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to