Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45174261
Hei @lirui-intel, I talked about this a bit with Kay and we think there's a
simpler and more general approach.
* In addPendingTask, always add the task to host, rack and
executor-specific lists even if those executors aren't currently up (i.e.
remove the current if statements that present that); then add it to
pendingTasksWithNoPrefs only if there were no such executors up
* When a new executor comes up, remove any tasks from
pendingTasksWithNoPrefs that have preferred locations on that executor / host /
rack; now locality-aware scheduling will automatically happen for those
One other thing to watch out for will be the time complexity of these
operations -- we don't want linear-time stuff in the size of
pendingTasksWithNoPrefs if we can help it because there might be hundreds of
executors coming up over time. But we can worry about that once we have an
implementation.
This feature is great to have overall though because it will allow dynamic
resizing of clusters (e.g. on YARN we might scale up and down based on whether
we have pending task sets).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---