[
https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828340#comment-13828340
]
Bikas Saha commented on YARN-1412:
----------------------------------
What is the value of the following configuration?
yarn.scheduler.capacity.node-locality-delay
It looks like you are being hit by a bug that will happen with small number of
container requests.
LeafQueue.assignContainersOnNode()
Looks like if rackLocalityDelay is not met, even then the scheduler falls back
to off-switch assignment. The delay calculation for off-switch assignment is
basically (#different-locations/#nodes-in-cluster)*#containers <
#node-heartbeats-without-assignment. In your case, if you have 20 nodes in all,
(2/20)*1 == 0.1. So the moment we skip 1 node (waiting for locality delay) we
end up assigning an off-switch container to the request.
Try the following, set the node locality delay mentioned at the beginning to
the number of nodes in the cluster. Then instead of asking for 1 container at
pri 0, ask for 20 containers, each for a specific node, rack=false, relax=true.
The above off-switch locality delay will become 20/20*1 == 20 missed
assignments.
If you see correct assignments then the above theory is correct about the bug.
Btw, what you are trying to do (node=specific, rack=null and
relaxLocality=true) is the default behavior of existing schedulers. They will
always try to relax locality to rack and then off-switch by default. So you
dont need to explicitly code for it.
> Allocating Containers on a particular Node in Yarn
> --------------------------------------------------
>
> Key: YARN-1412
> URL: https://issues.apache.org/jira/browse/YARN-1412
> Project: Hadoop YARN
> Issue Type: Bug
> Environment: centos, Hadoop 2.2.0
> Reporter: gaurav gupta
>
> Summary of the problem:
> If I pass the node on which I want container and set relax locality default
> which is true, I don't get back the container on the node specified even if
> the resources are available on the node. It doesn't matter if I set rack or
> not.
> Here is the snippet of the code that I am using
> AMRMClient<ContainerRequest> amRmClient = AMRMClient.createAMRMClient();;
> String host = "h1";
> Resource capability = Records.newRecord(Resource.class);
> capability.setMemory(memory);
> nodes = new String[] {host};
> // in order to request a host, we also have to request the rack
> racks = new String[] {"/default-rack"};
> List<ContainerRequest> containerRequests = new
> ArrayList<ContainerRequest>();
> List<ContainerId> releasedContainers = new ArrayList<ContainerId>();
> containerRequests.add(new ContainerRequest(capability, nodes, racks,
> Priority.newInstance(priority)));
> if (containerRequests.size() > 0) {
> LOG.info("Asking RM for containers: " + containerRequests);
> for (ContainerRequest cr : containerRequests) {
> LOG.info("Requested container: {}", cr.toString());
> amRmClient.addContainerRequest(cr);
> }
> }
> for (ContainerId containerId : releasedContainers) {
> LOG.info("Released container, id={}", containerId.getId());
> amRmClient.releaseAssignedContainer(containerId);
> }
> return amRmClient.allocate(0);
--
This message was sent by Atlassian JIRA
(v6.1#6144)