Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/892#issuecomment-45174261
  
    Hei @lirui-intel, I talked about this a bit with Kay and we think there's a 
simpler and more general approach.
    * In addPendingTask, always add the task to host, rack and 
executor-specific lists even if those executors aren't currently up (i.e. 
remove the current if statements that present that); then add it to 
pendingTasksWithNoPrefs only if there were no such executors up
    * When a new executor comes up, remove any tasks from 
pendingTasksWithNoPrefs that have preferred locations on that executor / host / 
rack; now locality-aware scheduling will automatically happen for those
    
    One other thing to watch out for will be the time complexity of these 
operations -- we don't want linear-time stuff in the size of 
pendingTasksWithNoPrefs if we can help it because there might be hundreds of 
executors coming up over time. But we can worry about that once we have an 
implementation.
    
    This feature is great to have overall though because it will allow dynamic 
resizing of clusters (e.g. on YARN we might scale up and down based on whether 
we have pending task sets).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to