I think https://issues.apache.org/jira/browse/YARN-1412 was opened for this. I am afraid without the RM debug logs it will be hard to diagnose what is being observed.
Bikas -----Original Message----- From: Arun C Murthy [mailto:[email protected]] Sent: Tuesday, November 19, 2013 7:30 AM To: [email protected] Subject: Re: Allocating Containers on a particular Node in Yarn Sorry, I'm a little lost here. Can you please summarize the issue you are seeing? I can try help. Thanks. On Nov 14, 2013, at 7:55 PM, Gaurav Gupta <[email protected]> wrote: > Even after setting node-locality-delay to 50, it is not working > > -----Original Message----- > From: Gaurav Gupta [mailto:[email protected]] > Sent: Thursday, November 14, 2013 7:03 PM > To: [email protected] > Subject: RE: Allocating Containers on a particular Node in Yarn > > There are some other small applications running but the resources are > available on every node of cluster so resources should not be problem. > > Following is the node-locality-delay setting <property> > <name>yarn.scheduler.capacity.node-locality-delay</name> > <value>1</value> > <description> > Number of missed scheduling opportunities after which the > CapacityScheduler > attempts to schedule rack-local containers. > Typically this should be set to number of racks in the cluster, this > feature is disabled by default, set to -1. > </description> > </property> > > I am attaching logs from the Application Master which shows the > request being made and resources I am getting back > > 2013-11-14 18:48:09,016 main INFO util.RackResolver > (RackResolver.java:coreResolve(109)) - Resolved node10.morado.com to > /default-rack > 2013-11-14 18:48:09,017 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(540)) - Added priority=0 > 2013-11-14 18:48:09,023 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=0 resourceName=node10.morado.com > numContainers=1 > #asks=1 > 2013-11-14 18:48:09,023 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=0 resourceName=/default-rack numContainers=1 > #asks=2 > 2013-11-14 18:48:09,023 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=0 resourceName=* numContainers=1 #asks=3 > 2013-11-14 18:48:09,024 main INFO stram.StramAppMaster > (StramAppMaster.java:sendContainerAskToRM(925)) - Requested container: > Capability[<memory:8192, vCores:0>]Priority[1] > 2013-11-14 18:48:09,024 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(540)) - Added priority=1 > 2013-11-14 18:48:09,024 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=1 resourceName=* numContainers=1 #asks=4 > 2013-11-14 18:48:09,024 main INFO stram.StramAppMaster > (StramAppMaster.java:sendContainerAskToRM(925)) - Requested container: > Capability[<memory:8192, vCores:0>]Priority[2] > 2013-11-14 18:48:09,024 main INFO util.RackResolver > (RackResolver.java:coreResolve(109)) - Resolved node18.morado.com to > /default-rack > 2013-11-14 18:48:09,024 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(540)) - Added priority=2 > 2013-11-14 18:48:09,024 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=2 resourceName=node18.morado.com > numContainers=1 > #asks=5 > 2013-11-14 18:48:09,025 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=2 resourceName=/default-rack numContainers=1 > #asks=6 > 2013-11-14 18:48:09,025 main DEBUG impl.AMRMClientImpl > (AMRMClientImpl.java:addResourceRequest(570)) - addResourceRequest: > applicationId= priority=2 resourceName=* numContainers=1 #asks=7 > 2013-11-14 18:48:10,063 main INFO stram.StramAppMaster > (StramAppMaster.java:execute(764)) - Got new container., > containerId=container_1384399307129_0027_01_000002, > containerNode=node8.morado.com:51530, > containerNodeURI=node8.morado.com:8042, containerResourceMemory8192, > priority0 > 2013-11-14 18:48:10,218 main INFO stram.StramAppMaster > (StramAppMaster.java:execute(764)) - Got new container., > containerId=container_1384399307129_0027_01_000003, > containerNode=node8.morado.com:51530, > containerNodeURI=node8.morado.com:8042, containerResourceMemory8192, > priority1 > 2013-11-14 18:48:10,235 main INFO stram.StramAppMaster > (StramAppMaster.java:execute(764)) - Got new container., > containerId=container_1384399307129_0027_01_000004, > containerNode=node37.morado.com:50631, > containerNodeURI=node37.morado.com:8042, containerResourceMemory8192, > priority2 > > -Gaurav > > -----Original Message----- > From: Bikas Saha [mailto:[email protected]] > Sent: Thursday, November 14, 2013 6:37 PM > To: [email protected] > Subject: RE: Allocating Containers on a particular Node in Yarn > > What else is running on the cluster? What is the locality delay value > set to? This value is not time. It is the number of node heartbeat to > wait before assigning a rack local container. So if those many nodes > heartbeated to the RM before the RM could assign a node local machine > to that request then it will assign a rack local machine. > > It is interesting that if you don't specify the rack, ie. you want the > exact machine, even then you are not getting the exact machine. You > should either get the exact machine or your request will not be > fulfilled. You should never get a different machine. If this is what > you observe then please open a bug on Jira and attach the RM logs > mentioning the machine name and container id that were erroneous. You > will probably have to enable debug logs on the RM before you get the repro. > > Bikas > > -----Original Message----- > From: Gaurav Gupta [mailto:[email protected]] > Sent: Thursday, November 14, 2013 5:48 PM > To: [email protected] > Subject: RE: Allocating Containers on a particular Node in Yarn > > Hi Bikas, > > With scheduler delay on and relax locality set to true (with and > without Requesting the rack), I don't get the containers on the > required host. It always assign to different host. > I am using default Capacity scheduler. Here is the snippet of the code > > AMRMClient<ContainerRequest> amRmClient = > AMRMClient.createAMRMClient();; > String host = "h1"; > Resource capability = Records.newRecord(Resource.class); > capability.setMemory(memory); > nodes = new String[] {host}; > // in order to request a host, we also have to request the rack > racks = new String[] {"/default-rack"}; > List<ContainerRequest> containerRequests = new > ArrayList<ContainerRequest>(); > List<ContainerId> releasedContainers = new ArrayList<ContainerId>(); > containerRequests.add(new ContainerRequest(capability, nodes, > racks, Priority.newInstance(priority),false)); > if (containerRequests.size() > 0) { > LOG.info("Asking RM for containers: " + containerRequests); > for (ContainerRequest cr : containerRequests) { > LOG.info("Requested container: {}", cr.toString()); > amRmClient.addContainerRequest(cr); > } > } > > for (ContainerId containerId : releasedContainers) { > LOG.info("Released container, id={}", containerId.getId()); > amRmClient.releaseAssignedContainer(containerId); > } > return amRmClient.allocate(0); > > > > Thanks > Gaurav > > -----Original Message----- > From: Bikas Saha [mailto:[email protected]] > Sent: Wednesday, November 13, 2013 7:05 PM > To: [email protected] > Subject: RE: Allocating Containers on a particular Node in Yarn > > What you ask, try on request node and then fallback to others, is the > default behavior for current schedulers in yarn. Ie. relaxLocality is > true by default. > > -----Original Message----- > From: Thomas Weise [mailto:[email protected]] > Sent: Wednesday, November 13, 2013 3:55 PM > To: [email protected] > Subject: Re: Allocating Containers on a particular Node in Yarn > > Is it possible to specify a particular node and have RM fallback to an > different node only after making an attempt to allocate for the > requested node? In other words, is the combination of specific host > name and relaxLocality=TRUE meaningful at all? > > Thanks. > > > On Wed, Nov 13, 2013 at 3:23 PM, Alejandro Abdelnur > <[email protected]>wrote: > >> Gaurav, >> >> Setting relaxLocality to FALSE should do it. >> >> thanks. >> >> >> On Wed, Nov 13, 2013 at 2:58 PM, gaurav <[email protected]> wrote: >> >>> Hi, >>> I am trying to allocate containers on a particular node in Yarn but >>> Yarn is returning me containers on different node although the >>> requested node has resources available. >>> >>> I checked into the allocate(AllocateRequest request) function of >>> ApplicationMasterService and my request is as follows >>> >>> *request: ask { priority { priority: 1 } resource_name: "h2" >>> capability { >>> memory: 1000 } num_containers: 2 } ask { priority { priority: 1 } >>> resource_name: "/default-rack" capability { memory: 1000 } >> num_containers: >>> 2 } ask { priority { priority: 1 } resource_name: "*" capability { >> memory: >>> 1000 } num_containers: 2 } response_id: 1 progress: 0.0* >>> >>> but the containers that I am getting back is as follows >>> [Container: [ContainerId: container_1384381084244_0001_01_000002, > NodeId: >>> h1:1234, NodeHttpAddress: h1:2, Resource: <memory:1024, vCores:1>, >>> Priority: 1, Token: Token { kind: ContainerToken, service: h1:1234 >>> }, ], >>> Container: [ContainerId: container_1384381084244_0001_01_000003, > NodeId: >>> h1:1234, NodeHttpAddress: h1:2, Resource: <memory:1024, vCores:1>, >>> Priority: 1, Token: Token { kind: ContainerToken, service: h1:1234 >>> }, ]] >>> >>> I am attaching the test case that I have written along with the >>> mail. It uses classes under > org.apache.hadoop.yarn.server.resourcemanager package. >>> >>> Any pointers would be of great help >>> >>> Thanks >>> Gaurav >>> >>> >>> >>> >>> >> >> >> -- >> Alejandro >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or > entity to which it is addressed and may contain information that is > confidential, privileged and exempt from disclosure under applicable > law. If the reader of this message is not the intended recipient, you > are hereby notified that any printing, copying, dissemination, > distribution, disclosure or forwarding of this communication is > strictly prohibited. If you have received this communication in error, > please contact the sender immediately and delete it from your system. Thank You. > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or > entity to which it is addressed and may contain information that is > confidential, privileged and exempt from disclosure under applicable > law. If the reader of this message is not the intended recipient, you > are hereby notified that any printing, copying, dissemination, > distribution, disclosure or forwarding of this communication is > strictly prohibited. If you have received this communication in error, > please contact the sender immediately and delete it from your system. Thank You. > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
