Github user squito commented on the issue:
https://github.com/apache/spark/pull/15249
@mridulm on the questions about expiry from blacklists, you are not missing
anything -- this explictly does not do any timeouts at the taskset level (this
is mentioned in the design doc). The timeout code you see is mostly just
incremental stuff as a step towards https://github.com/apache/spark/pull/14079,
but doesn't actually add any value here.
The primary motivation for blacklisting that I've seen is actually quite
different from the use case you are describing -- its not to help deal w/
resource contention, but to deal w/ truly broken resources (a bad disk in all
the cases I can think of). In fact, in these cases, 1 hour is really short --
users really want something more like 6-12 hours probably. But 1 hr really
isn't so bad, it just means that the bad resources need to be "rediscovered"
that often, with a scheduling hiccup while that happens.
This is really different from the use case you are describing -- its a form
of back off to deal w/ resource contention. I have actually talked to a couple
of different folks about doing something like this recently and think it would
be great, though I see problems with this approach, since it allows other tasks
to still be scheduled on those executors, and also the time isn't relative to
the task runtime etc.
Nonetheless, an issue here might be that the old option serves some purpose
which is no longer supported. Do we need to add it back in? Just adding the
logic for the timeouts again is pretty easy, though
(a) I need to figure out the right place to do it so that it doesn't impact
scheduling performance
and more importantly
(b) I really worry about being able to configure things so that
blacklisting can actually handle totally broken resources. Eg., say that you
set the timeout to 10s. If your tasks take 1 minute each, then your one bad
executor might cycle through the leftover tasks, fail them all, pass the
timeout, and repeat that cycle a few times till you go over
spark.task.maxFailures. I don't see a good way to deal w/ while setting a
sensible a timeout for the entire application.
Two other workarounds:
(2) just enable the timeout per-task when the legacy configuration is used.
Leave it undocumented. We don't change behavior then, but configuration is
kind of a mess, and it'll be a headache to continue to maintain this
(3) Add a timeout just to *taskset* level blacklisting. So its a behavior
change from the existing blacklisting, which has a timeout per *task*. This
removes the interaction w/ spark.task.maxFailures that we've always got to
tiptoe around. I also think it might satisfy your use case even better. I
still don't think its a great solution to the problem, and we need something
else for handling this sort of backoff better, so I don't feel great about it
getting shoved into this feature.
I'm thinking (3) is the best but will give it a bit more thought. Also
@kayousterhout @tgravescs @markhamstra for opinions as well since this is a
bigger design point to consider.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]