[ 
https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828340#comment-13828340
 ] 

Bikas Saha commented on YARN-1412:
----------------------------------

What is the value of the following configuration? 
yarn.scheduler.capacity.node-locality-delay

It looks like you are being hit by a bug that will happen with small number of 
container requests.

LeafQueue.assignContainersOnNode()
Looks like if rackLocalityDelay is not met, even then the scheduler falls back 
to off-switch assignment. The delay calculation for off-switch assignment is 
basically (#different-locations/#nodes-in-cluster)*#containers < 
#node-heartbeats-without-assignment. In your case, if you have 20 nodes in all, 
(2/20)*1 == 0.1. So the moment we skip 1 node (waiting for locality delay) we 
end up assigning an off-switch container to the request.

Try the following, set the node locality delay mentioned at the beginning to 
the number of nodes in the cluster. Then instead of asking for 1 container at 
pri 0, ask for 20 containers, each for a specific node, rack=false, relax=true. 
The above off-switch locality delay will become 20/20*1 == 20 missed 
assignments.
If you see correct assignments then the above theory is correct about the bug.

Btw, what you are trying to do (node=specific, rack=null and 
relaxLocality=true) is the default behavior of existing schedulers. They will 
always try to relax locality to rack and then off-switch by default. So you 
dont need to explicitly code for it. 

> Allocating Containers on a particular Node in Yarn
> --------------------------------------------------
>
>                 Key: YARN-1412
>                 URL: https://issues.apache.org/jira/browse/YARN-1412
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: centos, Hadoop 2.2.0
>            Reporter: gaurav gupta
>
> Summary of the problem: 
>  If I pass the node on which I want container and set relax locality default 
> which is true, I don't get back the container on the node specified even if 
> the resources are available on the node. It doesn't matter if I set rack or 
> not.
> Here is the snippet of the code that I am using
> AMRMClient<ContainerRequest> amRmClient =  AMRMClient.createAMRMClient();;
>     String host = "h1";
>     Resource capability = Records.newRecord(Resource.class);
>     capability.setMemory(memory);
>     nodes = new String[] {host};
>     // in order to request a host, we also have to request the rack
>     racks = new String[] {"/default-rack"};
>      List<ContainerRequest> containerRequests = new 
> ArrayList<ContainerRequest>();
>     List<ContainerId> releasedContainers = new ArrayList<ContainerId>();
>     containerRequests.add(new ContainerRequest(capability, nodes, racks, 
> Priority.newInstance(priority)));
>     if (containerRequests.size() > 0) {
>       LOG.info("Asking RM for containers: " + containerRequests);
>       for (ContainerRequest cr : containerRequests) {
>         LOG.info("Requested container: {}", cr.toString());
>         amRmClient.addContainerRequest(cr);
>       }
>     }
>     for (ContainerId containerId : releasedContainers) {
>       LOG.info("Released container, id={}", containerId.getId());
>       amRmClient.releaseAssignedContainer(containerId);
>     }
>     return amRmClient.allocate(0);



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to