Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/892#issuecomment-45466828
  
    Thanks @lirui-intel for updating this!  Can you please add unit tests to 
check that this works properly when no node-local executors for the job are 
immediately available, and also for when only some node-local executors for the 
job are immediately available?
    
    I'm concerned there may be some corner cases that are not handled properly, 
which a unit test could help catch. In particular, one issues is that 
computeValidLocalityLevels() is no longer correct.  Because now you're always 
adding tasks to pendingTasksForExecutor and the other similar lists (even when 
that particular executor is not alive), if tasks have locality preferences 
(even if there are no alive executors that satisfy them), PROCESS_LOCAL, 
NODE_LOCAL, and RACK_LOCAL will always be added as valid locality levels in 
computeValidLocalityLevels().
    
    Consider a job with two tasks: task 0 has a locality constraint for which 
there is a rack local executor but not a process local or node local one, and 
task 1 has a locality constraint that is not met (there are no process local, 
node local, or rack local executors).  Here, pendingTasksWithNoPrefs will 
include task 1 (but not task 0, because it has a rack-local executor 
available).  As a result of the changed code mentioned above, PROCESS_LOCAL, 
NODE_LOCAL, and RACK_LOCAL will all be valid locality levels, so when 
findTask() is first called, the allowed locality level will be PROCESS_LOCAL.  
This means that RACK_LOCAL tasks will not yet be allowed, so task 0 will not be 
scheduled, and task 1, which is in pendingTasksWithNoPrefs, will get scheduled. 
 This is not the correct behavior: in this case, we'd prefer to schedule task 0 
(which will be rack-local).
    
    As a result of this problem, I think we need to change the way 
computeValidLocalityLevels() works -- probably we need to set the valid 
locality levels as part of the addPendingTask() function.
    
    Let me know if this example doesn't make sense or you think it is 
incorrect!  @mateiz you may want to weigh in here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to