GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/15249

    [SPARK-17675] [CORE] Expand Blacklist for TaskSets

    ## What changes were proposed in this pull request?
    
    This is a step along the way to SPARK-8425.
    
    To enable incremental review, the first step proposed here is to expand the 
blacklisting within tasksets. In particular, this will enable blacklisting for
    (task, executor) pairs (this already exists via an undocumented config)
    (task, node)
    (taskset, executor)
    (taskset, node)
    In particular, adding (task, node) is critical to making spark 
fault-tolerant of one-bad disk in a cluster, without requiring careful tuning 
of "spark.task.maxFailures". The other additions are also important to avoid 
many misleading task failures and long scheduling delays when there is one bad 
node on a large cluster.
    
    Note that some of the code changes here and really required for just this 
-- they put pieces in place for SPARK-8425 even though they are not used yet 
(eg. the `BlacklistTracker` helper is a little out of place, and 
`TaskSetBlacklist` holds onto a little more info than it needs to for just this 
change).
    
    ## How was this patch tested?
    
    Added unit tests, run tests via jenkins.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark taskset_blacklist_only

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15249
    
----
commit 9a6aaede723b206a95a616ae740dcc5ed6b74cbc
Author: mwws <[email protected]>
Date:   2015-12-29T06:01:17Z

    enhance blacklist mechanism
    
    1. create new BlacklistTracker and BlacklistStrategy interface to
    support
    complex use case for blacklist mechanism.
    2. make Yarn allocator aware of node blacklist information
    3. three strategies implemented for convenience, also user can define
    his own strategy
    SingleTaskStrategy: remain default behavior before this change.
    AdvanceSingleTaskStrategy: enhance SingleTaskStrategy by supporting
    stage level node blacklist
    ExecutorAndNodeStrategy: different taskSet can share blacklist
    information.

commit 5bfe94106f8c49f2c3d60d13bd2bb6389be94e4f
Author: Imran Rashid <[email protected]>
Date:   2016-05-10T17:49:05Z

    Update for new design

commit d7adc6711e15994cefc280666486f54efb0f95cd
Author: Imran Rashid <[email protected]>
Date:   2016-07-07T20:32:51Z

    (node,task) blacklisting

commit a34e9aeb695958c749d306595d1adebe0207fdf9
Author: Imran Rashid <[email protected]>
Date:   2016-07-07T20:52:43Z

    go back to having the blacklist tracker as an option, rather than the no-op 
implementation

commit cf5837410818dae093ef15617cb42336a14408db
Author: Imran Rashid <[email protected]>
Date:   2016-07-07T21:32:09Z

    dont count shuffle-fetch failures

commit 7fcb266f7816de8127807017a755d518ab5d314e
Author: Imran Rashid <[email protected]>
Date:   2016-07-08T16:57:42Z

    make sure we clear the (node, task) blacklist on stage completion, add to 
test

commit 487eb66f850ba6cf68cc814d03b7daa085f591b4
Author: Imran Rashid <[email protected]>
Date:   2016-07-11T21:42:15Z

    review feedback

commit c22aaad76f07cbe58ea455d18959470e7afb1498
Author: Imran Rashid <[email protected]>
Date:   2016-07-12T01:55:39Z

    fix

commit dc2b3ed4cc256fdb588947bf293d7517c754c45b
Author: Imran Rashid <[email protected]>
Date:   2016-07-13T11:28:49Z

    all taskset specific blacklisting is now in TaskSetManager

commit 338db65e75c57bdc4af34ee342bc4ab843468b90
Author: Imran Rashid <[email protected]>
Date:   2016-07-13T11:31:03Z

    fix

commit fa3e34aceb5a20aa7a0324b50d55e71e9a50ae6d
Author: Imran Rashid <[email protected]>
Date:   2016-07-13T16:30:28Z

    Merge branch 'master' into blacklist-SPARK-8425

commit 16afb431f3827d10c622d0f8ec436a8c01c961fb
Author: Imran Rashid <[email protected]>
Date:   2016-07-14T15:58:39Z

    review feedback

commit 7aff08a52099334eb6ac242c7ba70a9873aef624
Author: Imran Rashid <[email protected]>
Date:   2016-07-14T21:22:57Z

    review feedback

commit e181546bbcce249ace26c952666493a187f95437
Author: Imran Rashid <[email protected]>
Date:   2016-07-14T21:26:54Z

    rename conf

commit 351a9a7e2893a0b90c57233d5e44a52c147bb2a8
Author: Imran Rashid <[email protected]>
Date:   2016-07-14T21:45:24Z

    use typed confs consistently

commit 572c7773c44010428e4c1b39a4b03fdbf103a3d6
Author: Imran Rashid <[email protected]>
Date:   2016-07-20T17:11:15Z

    docs

commit 8cebb01a0ad19eeccd4143557b7cf22836041c70
Author: Imran Rashid <[email protected]>
Date:   2016-07-20T17:35:04Z

    api simplification

commit dbf904e80ad892b76d29bf0092db5810d14b3271
Author: Imran Rashid <[email protected]>
Date:   2016-07-20T18:49:07Z

    review feedback

commit f0de0db8c54ac2dc66b0ab59b1b6c155a6775014
Author: Imran Rashid <[email protected]>
Date:   2016-07-20T21:02:14Z

    fix for config name change

commit 8a12adf445b00e8841eb3df071c0b6adee6c16da
Author: Imran Rashid <[email protected]>
Date:   2016-07-22T17:16:46Z

    exclude killed tasks and preempted tasks from blacklist

commit c9e3662dd468fe262fcb663ca1898cac0029f8d7
Author: Imran Rashid <[email protected]>
Date:   2016-07-26T16:20:58Z

    combine imports

commit 497e6268b46549502b615b487ebf26361ecf4623
Author: Imran Rashid <[email protected]>
Date:   2016-08-11T21:26:16Z

    review feedback

commit 515b18a6d15b1385caf6e79a849e3397e37e8602
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T15:16:36Z

    add task timeouts

commit f0428b46e7de9837fa30850be85b7fb482191a48
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T15:24:47Z

    separate datastructure to track blacklisted execs in a tsm, to simplify code

commit a5fbce754a4ec8371a85439f88f87ddc15229fc2
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T15:33:03Z

    Merge branch 'master' into blacklist-SPARK-8425

commit b582d8e84fc17d908b43b91ad819acfc5b4b79fc
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T15:34:49Z

    fix missing import

commit cec36c93c16d3b3f02a4720649c187595471663d
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T15:47:24Z

    fix line wrapping

commit 290b3154d8cc179f6f18742ce53f50861fcc2c9c
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T20:06:16Z

    fix test by turning off blacklist

commit 8c58ad96db948bec95f2e99fb9298db481e60b58
Author: Imran Rashid <[email protected]>
Date:   2016-08-18T22:04:48Z

    unused import

commit f01278038b5ee5cfdb80de9b254ae9fab0bd5f52
Author: Imran Rashid <[email protected]>
Date:   2016-08-22T15:34:32Z

    review feedback

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to