Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3779#issuecomment-72584682
_(Stream-of-consciousness ahead, mostly for my own benefit as I think
through the implications of this PR)_
I'd like to understand whether this patch has any performance implications
for Spark jobs in general. Is there any scenario where this scheduling change
might introduce performance regressions?
I guess that a task's locality wait is a sort of hard deadline, where we
won't consider scheduling a task at a lower locality level until at least that
much time has elapsed. This sounds like a per-task property, where the
deadline for one task shouldn't apply / influence other tasks (e.g. treat each
task independently), but the bug reported here sounds like we're applying the
locality waits to sets of tasks.
It seems like we always want to attempt to schedule tasks in decreasing
order of locality, so if there are unscheduled process-local tasks then we
should always attempt to schedule them before any tasks at lower locality
levels. In addition, if there are free slots in the cluster and there are
unscheduled tasks in the scheduler's current locality level, then we should
schedule those tasks. If we've exhausted all tasks at a particular locality
level, then it makes sense to immediately move onto scheduling at the next
lower locality level.
If we have tasks at some high locality level that cannot be scheduled at
their preferred locality and there are tasks at a lower locality level that
_can_ be scheduled, then I guess we might be concerned about whether scheduling
the less-local tasks could rob tasks waiting at a higher locality level of
their opportunity to run. This shouldn't happen, though, since those tasks
will already have been offered those resources and turned them down.
Therefore, I think that this patch is a good fix: it doesn't make sense to
let a single task with strong preferences to delay / block the scheduling of
other tasks with weaker preferences / preferences for other resources. I'll
take a closer look at the code and tests now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]