Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/15249
@kayousterhout When an executor or node is shutting down it is actually at
driver level (not just taskset level) - since all tasks would fail on executors
when they are shutting down.
But if the issue is transient resource issue, then other tasks in the
taskset can succeed (skew in data for example).
The primary motivation for adding blacklist initially was the former -
executor shutdown; but it got used to tackle the latter as well, when tasks
fail due to resource issue due to skew (and keep getting scheduled on the same
executor due to locality info).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]