Github user lirui-intel commented on the pull request:

    https://github.com/apache/spark/pull/892#issuecomment-45293719
  
    Hi @kayousterhout , let's consider a map stage whose tasks all have 
NODE_LOCAL preference. So pendingTasksForExecutor is empty and all tasks are 
added to pendingTasksForHost. If at the beginning no node is available, then 
all tasks will also be added to pendingTasksWithNoPrefs (as @mateiz suggested). 
When we look for a task to launch, we always try to launch a PROCESS_LOCAL one 
first. Suppose now an executor comes in and it can satisfy some tasks in 
pendingTasksForHost. However, since we always try with PROCESS_LOCAL first, 
pendingTasksForHost will simply be skipped in TaskSetManager.findTask. And we 
end up picking a task in pendingTasksWithNoPrefs. You can see there's an if 
statement when trying to pick tasks from pendingTasksForHost and 
pendingTasksForRack, to test if we currently allow such "low" locality level. 
But tasks in pendingTasksForExecutor and pendingTasksWithNoPrefs are picked 
unconditionally since they are considered have the highest level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to