Github user kayousterhout commented on the issue:
https://github.com/apache/spark/pull/15249
I thought about this a little more and had some offline discussion with
Imran. @mridulm, I re-read all of your comments and it sounds like there are
two issues that are addressed by the old blacklisting mechanism. I've
attempted to summarize the state of things here:
### (1) Executor Shutdown
**Problem**: Sometimes executors are in the process of being killed, and
tasks get scheduled on them during this period. It's bad if we don't do
anything about this, because (e.g., due to locality) a task could repeatedly
get re-scheduled on that executor, eventually causing the task to exhaust its
max number of failures and the job to be aborted.
**Approach in current Spark**: A per-task blacklist avoids re-scheduling
tasks on the same executor for a configurable period of time (by default, 0).
**Approach in this PR**: After a configurable number of failures (default
1), the executor will be permanently blacklisted. For the executor that's
shutting down, any tasks run on it will be permanently blacklisted from that
executor, and the executor may eventually be permanently blacklisted for the
task set. This seems like a non-issue since the executor is shutting down
anyway, so an an executor level, this new approach works at least as well as
the old approach.
On a HOST level, the new approach is different: the failures on that
executor will count towards the max number of failures on the host where the
executor is/was running. This could be problematic if there are other
executors on the same host. For example, if the max failures per host is 1, or
multiple executors on one host have this shutdown issue, the entire host will
be permanently blacklisted for the task set. Does this seem like an issue to
folks? I'm not very familiar with YARN's allocation policies, but it seems like
if it's common to have many executors per host, a user would probably set max
failures per host to be > max failures per executor. In this case, the new
host-level behavior is only problematic if (1) multiple executors on the host
have this being-shutdown-issue AND (2) YARN allocates more executors on the
host after that.
### (2) Temporary Resource Contention**
**Problem**: Sometimes machines have temporary resource contention; e.g.,
disk or memory is temporarily full with data from another job. If we don't do
anything about this, tasks will repeatedly get re-scheduled on the bad executor
(e.g., due to locality), eventually causing the task to exhaust its max number
of failures and the job to be aborted.
**Approach in current Spark**: A per-task blacklist avoids re-scheduling
tasks on the same executor for a configurable period of time (by default, 0).
This allows tasks to eventually get a chance to use the executor again (but as
@tgravescs pointed out, this timeout may be hard to configure, and needs to be
balanced with the locality wait time, since if the timeout is > locality wait
timeout, the task will probably get scheduled on a different machine anyway).
**Approach in this PR**: After a configurable number of attempts, tasks
will be permanently blacklisted from the temporarily contended executor (or
host) and will be run a different machine, even though the task may succeed on
the host later. The biggest consequence of this seems to be that the task may
be forced to run on a non-local machine (and it may need to wait for the
locality wait timer to expire before being scheduled).
@mridulm are these issues summarized correctly above? If so, can you
elaborate on why the approach in this PR isn't sufficient? I agree with Imran
that, if the approach in this PR doesn't seem sufficient for these two cases,
we should just leave in the old mechanism.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]