Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-49558463
Hi, @kayousterhout @mateiz @mridulm @lirui-intel , thanks for the comments
I just updated the patch, here is the basic idea of the current PR
1. added a NOPREF locality value to TaskLocality, which is supposed to be
farther than NODE_LOCAL but nearer than RACK_LOCAL
2. when we failed to find a task on a level, we would try to pass NOPREF as
the maxLocality to the findTask, in this way, if we fail to find a task on
PROCESS_LOCAL, we will call resourceOffer with NOPREF as the parameter again,
if there is a NODE_LOCAL, then the NODE_LOCAL will be started first, if not we
will start NOPREF tasks; if we only have NOPREF tasks, we do not need to wait 3
seconds for the current locality to arrive at NODE_LOCAL level, instead, the
resourceOffer(..., NOPREF, ...) will be called immediately after the failure
of finding a task on PROCESS_LOCAL
3. to avoid duplicate traverse on the task list, I added a minimumLocality
parameter to findTask, so that, we can skip those "already known-as-unavailable
" levels (PROCESS_LOCAL or NODE_LOCAL) when trying to find NOPREF tasks...
4. another minor change is to calculate available localities when the
executor is lost
waiting for the Jenkins...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---