GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/1313

    SPARK-2294: fix locality inversion bug in TaskManager

    copied from original JIRA 
(https://issues.apache.org/jira/browse/SPARK-2294): 
    
    If an executor E is free, a task may be speculatively assigned to E when 
there are other tasks in the job that have not been launched (at all) yet. 
Similarly, a task without any locality preferences may be assigned to E when 
there was another NODE_LOCAL task that could have been scheduled.
    This happens because TaskSchedulerImpl calls TaskSetManager.resourceOffer 
(which in turn calls TaskSetManager.findTask) with increasing locality levels, 
beginning with PROCESS_LOCAL, followed by NODE_LOCAL, and so on until the 
highest currently allowed level. Now, supposed NODE_LOCAL is the highest 
currently allowed locality level. The first time findTask is called, it will be 
called with max level PROCESS_LOCAL; if it cannot find any PROCESS_LOCAL tasks, 
it will try to schedule tasks with no locality preferences or speculative 
tasks. As a result, speculative tasks or tasks with no preferences may be 
scheduled instead of NODE_LOCAL tasks.
    
    
    ----
    
    
    I added an additional parameter in resourceOffer and findTask, maxLocality, 
indicating when we should consider the tasks without locality preference

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-2294

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1313
    
----
commit 35524413287990734685125ec02eb8dd58f97b12
Author: CodingCat <zhunans...@gmail.com>
Date:   2014-07-07T04:37:06Z

    fix locality inversion bug in TaskManager

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to