Prabhu Joseph created YARN-10293: ------------------------------------ Summary: Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259) Key: YARN-10293 URL: https://issues.apache.org/jira/browse/YARN-10293 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph
Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related to it https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987 Have found one more bug in the CapacityScheduler.java code which causes the same issue with slight difference in the repro. *Repro:* *Nodes : Available : Used* Node1 - 8GB, 8vcores - 8GB. 8cores Node2 - 8GB, 8vcores - 8GB. 8cores Node3 - 8GB, 8vcores - 8GB. 8cores Queues -> A and B both 50% capacity, 100% max capacity MultiNode enabled + Preemption enabled 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores 2. JobB Submitted to B queue with AM size of 1GB {code} 2020-05-21 12:12:27,313 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest IP=172.27.160.139 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1590046667304_0005 CALLERCONTEXT=CLI QUEUENAME=dummy {code} 3. Preemption happens and used capacity is lesser than 1.0f {code} 2020-05-21 12:12:48,222 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics: Non-AM container preempted, current appAttemptId=appattempt_1590046667304_0004_000001, containerId=container_e09_1590046667304_0004_01_000024, resource=<memory:1024, vCores:1> {code} 4. JobB gets a Reserved Container as part of CapacityScheduler#allocateOrReserveNewContainer {code} 2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to RESERVED 2020-05-21 12:12:48,226 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1> {code} *Why RegularContainerAllocator reserved the container when the used capacity is <= 1.0f ?* {code} The reason is even though the container is preempted - nodemanager has to stop the container and heartbeat and update the available and unallocated resources to ResourceManager. {code} 5. Now, no new allocation happens and reserved container stays at reserved. After reservation the used capacity becomes 1.0f, below will be in a loop and no new allocate or reserve happens. The reserved container cannot be allocated as reserved node does not have space. node2 has space for 1GB, 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting called causing the Hang. *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateContainersOnMultiNodes#allocateFromReservedContainer -> Node3 has reserved container* {code} 2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1590046667304_0005 on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignContainers: partition= #applications=1 2020-05-21 12:13:33,242 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with resource=<memory:1024, vCores:1> 2020-05-21 12:13:33,243 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted {code} CapacityScheduler#allocateOrReserveNewContainers won't be called as below check in allocateContainersOnMultiNodes fails {code} if (getRootQueue().getQueueCapacities().getUsedCapacity( candidates.getPartition()) >= 1.0f && preemptionManager.getKillableResource( {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org