Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/3816#issuecomment-68326024
yes, @mateiz was right, I would like to give more clues to facilitate your
debugging
1, NO_PREF will not be adjusted by getAllowedLocalityLevel() method of
TaskSetManager(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L431).
I made it to ensure that NO_PREF tasks can be scheduled ASAP instead of
waiting for NODE_LOCAL
2, based on 1, when the resourceOffers() of TaskScheduleImpl
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L217)
is called for another time **before the more local level expires** and **
after a NO_PREF task is scheduled **, we may see that, the locality level was
bumped up to RACK_LOCAL, instead of going through PROCESS_LOCAL, NODE_LOCAL,
etc.
3. in the JIRA discussion, Rui Li's understanding on that the if check is
correct, because we need to return PROCESS_LOCAL and also don't want to reset
currentLocalityIndex (so I didn't get the reason of the performance degrading
you mentioned)
4. the logic of returning PROCESS_LOCAL for NO_PREF tasks actually has
exited for a long while before my patch on TaskSetManager.scala...I just
followed this idea and my own understanding on that
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]