[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

mridulm Wed, 09 Jul 2014 10:37:26 -0700

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1313#discussion_r14723764
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -362,21 +363,25 @@ private[spark] class TaskSetManager(
           }
         }
     
    -    // Look for no-pref tasks after rack-local tasks since they can run 
anywhere.
    -    for (index <- findTaskFromList(execId, pendingTasksWithNoPrefs)) {
    -      return Some((index, TaskLocality.PROCESS_LOCAL, false))
    -    }
    -
    -    if (TaskLocality.isAllowed(locality, TaskLocality.ANY)) {
    -      for (index <- findTaskFromList(execId, allPendingTasks)) {
    +    if (locality == maxAllowedLocality) {
    +      // Look for no-pref tasks after rack-local tasks since they can run 
anywhere.
    +      for (index <- findTaskFromList(execId, pendingTasksWithNoPrefs)) {
    --- End diff --
    
    
    pendingTasksWithNoPrefs should be run after NODE_LOCAL tasks have been 
picked, and before RACK_LOCAL tasks are picked for the current executor.
    Under 'normal' conditions, this list contains tasks which do not have any 
locality preference (running on any node is fine) - so if there are no node 
specific tasks available for an executor, running these before RACK_LOCAL 
minimizes data transfer in the cluster.
    
    RACK_LOCAL tasks get typically scheduled much later in typical jobs (via 
locality wait) to avoid inter node data transfer : so pushing tasks without 
pref to after that is inefficient (and results in simply idling executors).
    
    
    Unfortunately, given recent scheduler changes, it is not so clear anymore 
what the population in pendingTasksWithNoPrefs would be : tasks which do not 
have any preference or tasks which have preference but those are not available 
right now. Is that the reason for the confusion ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

Reply via email to