[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
------------------------------

    Description: 
In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
data is local to a node by searching for the nodeAddress of the node in the set 
of outstanding requests for the app.  This seems to be incorrect as it should 
be checking hostname instead.  The offending line of code is 455:

application.getResourceRequest(priority, node.getRMNode().getNodeAddress());

Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
are a concatenation of hostname and command port (e.g. host1.foo.com:1234)

In the CapacityScheduler, it's done using hostname.  See 
LeafQueue.assignNodeLocalContainers, line 1129

application.getResourceRequest(priority, node.getHostName());

Note that this bug does not affect the actual scheduling decisions made by the 
FifoScheduler because even though it incorrect determines that a request is not 
local to the node, it will still schedule the request immediately because it's 
rack-local.  However, this bug may be adversely affecting the reporting of job 
status by underreporting the number of tasks that were node local.

  was:
In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
data is local to a node by searching for the nodeAddress of the node in the set 
of outstanding requests for the app.  This seems to be incorrect as it should 
be checking hostname instead.  The offending line of code is 455:

application.getResourceRequest(priority, node.getRMNode().getNodeAddress());

Requests are formated by hostname (e.g. host1.foo.com) where as node addresses 
are a concatenation of hostname and command port (e.g. host1.foo.com:1234)

In the CapacityScheduler, it's done using hostname.  See 
LeafQueue.assignNodeLocalContainers, line 1129

application.getResourceRequest(priority, node.getHostName());

Note that this but does not affect the actual scheduling decisions by the 
FifoScheduler because even though it incorrect determines that a request is not 
local to the node, it will still schedule the request immediately because it's 
rack-local.  However, this bug may be adversely affecting the reporting of job 
status by underreporting the number of tasks that were node local.

    
> FifoScheduler incorrectly checking for node locality
> ----------------------------------------------------
>
>                 Key: YARN-412
>                 URL: https://issues.apache.org/jira/browse/YARN-412
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Roger Hoover
>            Priority: Minor
>              Labels: patch
>         Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to