Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13599131
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val curTime = clock.getTime()
var allowedLocality = getAllowedLocalityLevel(curTime)
- if (allowedLocality > maxLocality) {
+ if (allowedLocality > maxLocality &&
myLocalityLevels.contains(maxLocality)) {
allowedLocality = maxLocality // We're not allowed to search for
farther-away tasks
}
--- End diff --
@mridulm - Thanks for replying. In my opinion, however, relaxing the
allowed locality won't change the scheduling order. NODE_LOCAL tasks (if any)
still get scheduled before RACK_LOCAL ones. And if we allow RACK_LOCAL but get
a NODE_LOCAL task, currentLocalityIndex will be updated so that next time we
will use NODE_LOCAL as the constraint.
However, if we restrict up to PROCESS_LOCAL while it's in fact not valid
for the TaskSetManager, the NODE_LOCAL and RACK_LOCAL tasks will be skipped and
we may end up picking tasks from pendingTasksWithNoPrefs.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---