Github user srowen commented on the issue:
@tgravescs that's actually progress. You're no longer saying that the goal
is to keep a few executors around just in case
(https://issues.apache.org/jira/browse/SPARK-21656) or that the problem is
waiting on locality
I believe then you're saying the problem is what I asked about in
https://github.com/apache/spark/pull/18874#issuecomment-321315467 : there
should be no way to go to 0 executors when there is any work to do. The
scheduler should never make that decision, even if the min. I agree.
You're actually saying something stronger: the number of executors should
not go below target, not minimum. If so can we update the description to state
it that way? (and if so, is checking vs min redundant?)
But what about just fixing the 0 executor case as that is the scenario
where no progress can be made?
This change is a heuristic with side effects, as noted just above. This
means you don't remove legitimately idle executors that the scheduler won't
use. It harms the common case, though probably marginally. It's behavior change.
It helps a case only where the driver is stuck for periods longer than an
executor idle timeout. I think you have bigger problems if this is the case,
right? If you have 60s GC pauses, you need to tune GC (or, idle timeout), but
it's fair to have to tune _something_ if you don't like the slowdown from
executors having to be reallocated.
Is this a good tradeoff? I don't think so, but don't feel extreme about it.
Is it important to address 0 executors, more narrowly? yes.
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org