GitHub user LucaCanali opened a pull request:

    https://github.com/apache/spark/pull/19039

    Add feature to permanently blacklist a user-specified list of nodes, …

    …SPARK-21829
    
    ## What changes were proposed in this pull request?
    
    With this new feature I propose to introduce a mechanism to allow users to 
specify a list of nodes in the cluster where executors/tasks should not run for 
a specific job.
    The proposed implementation that I tested (see PR) uses the Spark blacklist 
mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list of 
user-specified nodes is added to the blacklist at the start of the Spark 
Context and it is never expired. 
    I have tested this on a YARN cluster on a case taken from the original 
production problem and I confirm a performance improvement of about 5x for the 
specific test case I have. I imagine that there can be other cases where Spark 
users may want to blacklist a set of nodes. This can be used for 
troubleshooting, including cases where certain nodes/executors are slow for a 
given workload and this is caused by external agents, so the anomaly is not 
picked up by the cluster manager.
    See also SPARK-21829
    
    ## How was this patch tested?
    
    A test has been added to the BlackListTrackerSuite.
    The patch has also been successfully tested manually on a YARN cluster.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/LucaCanali/spark 
SchedulerAlwaysBlacklistedNodes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19039
    
----
commit 4932d3767e3c3bb24a94a7422b7b8e6f7a5bf08b
Author: LucaCanali <[email protected]>
Date:   2017-08-24T13:27:54Z

    Add feature to permanently blacklist a user-specified list of nodes, 
SPARK-21829

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to