Rui Li created SPARK-1937:
-----------------------------

             Summary: Tasks can be submitted before executors are registered
                 Key: SPARK-1937
                 URL: https://issues.apache.org/jira/browse/SPARK-1937
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.0.0
            Reporter: Rui Li


During construction, TaskSetManager will assign tasks to several pending lists 
according to the tasks’ preferred locations. If the desired location is 
unavailable, it’ll then assign this task to “pendingTasksWithNoPrefs”, a list 
containing tasks without preferred locations.
The problem is that tasks may be submitted before the executors get registered 
with the driver, in which case TaskSetManager will assign all the tasks to 
pendingTasksWithNoPrefs. Later when it looks for a task to schedule, it will 
pick one from this list and assign it to arbitrary executor, since 
TaskSetManager considers the tasks can run equally well on any node.
This problem deprives benefits of data locality, drags the whole job slow and 
can cause imbalance between executors.
I ran into this issue when running a spark program on a 7-node cluster 
(node6~node12). The program processes 100GB data.
Since the data is uploaded to HDFS from node6, this node has a complete copy of 
the data and as a result, node6 finishes tasks much faster, which in turn makes 
it complete dis-proportionally more tasks than other nodes.
To solve this issue, I think we shouldn't check availability of executors/hosts 
when constructing TaskSetManager. If a task prefers a node, we simply add the 
task to that node’s pending list. When later on the node is added, 
TaskSetManager can schedule the task according to proper locality level. If 
unfortunately the preferred node(s) never gets added, TaskSetManager can still 
schedule the task at locality level “ANY”.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to