[
https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prabhu Joseph updated YARN-10293:
---------------------------------
Attachment: YARN-10293-005.patch
> Reserved Containers not allocated from available space of other nodes in
> CandidateNodeSet in MultiNodePlacement (YARN-10259)
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-10293
> URL: https://issues.apache.org/jira/browse/YARN-10293
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.3.0
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
> Attachments: YARN-10293-001.patch, YARN-10293-002.patch,
> YARN-10293-003-WIP.patch, YARN-10293-004.patch, YARN-10293-005.patch
>
>
> Reserved Containers not allocated from available space of other nodes in
> CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues
> related to it
> https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987
> Have found one more bug in the CapacityScheduler.java code which causes the
> same issue with slight difference in the repro.
> *Repro:*
> *Nodes : Available : Used*
> Node1 - 8GB, 8vcores - 8GB. 8cores
> Node2 - 8GB, 8vcores - 8GB. 8cores
> Node3 - 8GB, 8vcores - 8GB. 8cores
> Queues -> A and B both 50% capacity, 100% max capacity
> MultiNode enabled + Preemption enabled
> 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores
> 2. JobB Submitted to B queue with AM size of 1GB
> {code}
> 2020-05-21 12:12:27,313 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest
> IP=172.27.160.139 OPERATION=Submit Application Request
> TARGET=ClientRMService RESULT=SUCCESS APPID=application_1590046667304_0005
> CALLERCONTEXT=CLI QUEUENAME=dummy
> {code}
> 3. Preemption happens and used capacity is lesser than 1.0f
> {code}
> 2020-05-21 12:12:48,222 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics:
> Non-AM container preempted, current
> appAttemptId=appattempt_1590046667304_0004_000001,
> containerId=container_e09_1590046667304_0004_01_000024,
> resource=<memory:1024, vCores:1>
> {code}
> 4. JobB gets a Reserved Container as part of
> CapacityScheduler#allocateOrReserveNewContainer
> {code}
> 2020-05-21 12:12:48,226 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to
> RESERVED
> 2020-05-21 12:12:48,226 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Reserved container=container_e09_1590046667304_0005_01_000001, on node=host:
> tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8
> available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with
> resource=<memory:1024, vCores:1>
> {code}
> *Why RegularContainerAllocator reserved the container when the used capacity
> is <= 1.0f ?*
> {code}
> The reason is even though the container is preempted - nodemanager has to
> stop the container and heartbeat and update the available and unallocated
> resources to ResourceManager.
> {code}
> 5. Now, no new allocation happens and reserved container stays at reserved.
> After reservation the used capacity becomes 1.0f, below will be in a loop and
> no new allocate or reserve happens. The reserved container cannot be
> allocated as reserved node does not have space. node2 has space for 1GB,
> 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting
> called causing the Hang.
> *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes ->
> CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container
> on node*
> {code}
> 2020-05-21 12:13:33,242 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Trying to fulfill reservation for application application_1590046667304_0005
> on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
> 2020-05-21 12:13:33,242 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> assignContainers: partition= #applications=1
> 2020-05-21 12:13:33,242 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Reserved container=container_e09_1590046667304_0005_01_000001, on node=host:
> tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8
> available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with
> resource=<memory:1024, vCores:1>
> 2020-05-21 12:13:33,243 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Allocation proposal accepted
> {code}
> CapacityScheduler#allocateOrReserveNewContainers won't be called as below
> check in allocateContainersOnMultiNodes fails
> {code}
> if (getRootQueue().getQueueCapacities().getUsedCapacity(
> candidates.getPartition()) >= 1.0f
> && preemptionManager.getKillableResource(
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]