Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49234491
Thinking about this more, if I am not wrong, current scheduler can cause
suboptimal schedules when there are multiple tasksetmanagers. Particularly
relevant to graphx jobs from our experience.
When there are multiple tasksetmanagers present, and some of them have no
process local or node local executors, the locality level gets relaxed to
start from rack or any.
Depending on order of tasksetmanagers being executed, this can result in
rack local jobs starting before process or node local : potentially causing
more substandard schedules.
I am not sure if this is intentional (this was unexpected to me) ... or did
I misunderstand and this is not possible ?
I am uncertain about impact of noprefs in this context though; assuming my
assumption is valid of course.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---