Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45293719
Hi @kayousterhout , let's consider a map stage whose tasks all have
NODE_LOCAL preference. So pendingTasksForExecutor is empty and all tasks are
added to pendingTasksForHost. If at the beginning no node is available, then
all tasks will also be added to pendingTasksWithNoPrefs (as @mateiz suggested).
When we look for a task to launch, we always try to launch a PROCESS_LOCAL one
first. Suppose now an executor comes in and it can satisfy some tasks in
pendingTasksForHost. However, since we always try with PROCESS_LOCAL first,
pendingTasksForHost will simply be skipped in TaskSetManager.findTask. And we
end up picking a task in pendingTasksWithNoPrefs. You can see there's an if
statement when trying to pick tasks from pendingTasksForHost and
pendingTasksForRack, to test if we currently allow such "low" locality level.
But tasks in pendingTasksForExecutor and pendingTasksWithNoPrefs are picked
unconditionally since they are considered have the highest level.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---