Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/1313#issuecomment-50253029
Hi, @mateiz , thanks for the comments
If we just adding NO_PREF level, it can avoid the unnecessary waiting when
we only have no-pref tasks,
however, in the following scenario, we still need to wait for some time if
we only have PROCESS_LOCAL and NO_PREFS
if we have T1(PROCESS_LOCAL), T2(PROCESS_LOCAL), T3(NO_PREFS). then the
valid localities would be PROCESS_LOCAL, NODE_LOCAL (because process_local is
also NODE_LOCAL) and NO_PREFS. After we have scheduled T1 and T2, we need to
wait for 3s to check if we have NODE_LOCAL, no, then go to NO_PREFS to launch T3
In the previous discussion, we thought that this type of waiting is also
unnecessary (at least, it is not there in current master branch),
the current PR ensures that NO_PREFS can only be launched after
PROCESS_LOCAL and NODE_LOCAL and when these two higher prioritized ones are all
consumed, we don't need to wait unnecessarily
Maybe I missed some comments...I don't think I did some refactoring here?
I'm rebasing the PR and adding the comments
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---