Github user srowen commented on the issue:
Seems not-unreasonable to me given the current problem statement. It does
solve the possible problem about 0 executors, and then some.
The possible impact to a normal app is like: run a bunch of short-lived
stages (think iterative ML). Target executor count stays high. But the tasks
schedule on just a subset of executors because they finish quickly and the rest
wait for the data-local slot and finish on those executors too. In this
scenario, the extra executors can't be released, though will always be idle,
because they have to be there to keep up the target count. Right now, they'd be
released. This scenario is not unrealistic in my experience, but it's the only
problem scenario I can think of.
(Am I right that the check vs minimum executor count here is now redundant?
the target can't go under the minimum count, and executors can't go under the
target count now on removal.)
I guess I'm still sort of unclear how in the stuck-driver scenario that
`onExecutorBusy` isn't firing to mark executors as not-idle, but, the
idle-timeout `schedule()` loops is still running fine. But it's imaginable. Yes
this change fixes that scenario, and sounds like it has been observed, though
may be chalked up to dire driver states that are going to fall over anyhow. It
_does_ sound logical to not let the idle-timeout loop take the executor count
below target, even though I assumed it was because of the scenario above, maybe?
Those are the things we're weighing, and there's clear support for maybe
inconveniencing the first scenario to both help the second and fix the
0-executor risk, so I have no issue with that.
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org