GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/14079
[SPARK-8425][CORE] New Blacklist Mechanism
## What changes were proposed in this pull request?
This adds a blacklisting mechanism for Spark to handle bad executors and
nodes, eg. for when a disk fails. Previously there was an undocumented
blacklisting mechanism, but it was very limited. This supports blacklisting
individual tasks on an executor, executors and nodes for an entire stage, and
executors and nodes across an entire spark application. Full details are in a
design doc attached to the jira.
## How was this patch tested?
Added unit tests, ran them via Jenkins, also ran a handful of them in a
loop to check for flakiness.
The added tests include:
* verifying BlacklistTracker works correctly
* verifying TaskSchedulerImpl interacts with BlacklistTracker correctly
(via a mock BlacklistTracker)
* an integration test for the entire scheduler with blacklisting in a few
different scenarios
TODO
[ ] More testing on a real cluster
[ ] Fix & test handling of shuffle-fetch failures
[ ] Documentation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark blacklist-SPARK-8425
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14079.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14079
----
commit 9a6aaede723b206a95a616ae740dcc5ed6b74cbc
Author: mwws <[email protected]>
Date: 2015-12-29T06:01:17Z
enhance blacklist mechanism
1. create new BlacklistTracker and BlacklistStrategy interface to
support
complex use case for blacklist mechanism.
2. make Yarn allocator aware of node blacklist information
3. three strategies implemented for convenience, also user can define
his own strategy
SingleTaskStrategy: remain default behavior before this change.
AdvanceSingleTaskStrategy: enhance SingleTaskStrategy by supporting
stage level node blacklist
ExecutorAndNodeStrategy: different taskSet can share blacklist
information.
commit 5bfe94106f8c49f2c3d60d13bd2bb6389be94e4f
Author: Imran Rashid <[email protected]>
Date: 2016-05-10T17:49:05Z
Update for new design
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]