Prabhu Joseph created YARN-10293:
------------------------------------
Summary: Reserved Containers not allocated from available space of
other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)
Key: YARN-10293
URL: https://issues.apache.org/jira/browse/YARN-10293
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph
Reserved Containers not allocated from available space of other nodes in
CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related
to it
https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987
Have found one more bug in the CapacityScheduler.java code which causes the
same issue with slight difference in the repro.
*Repro:*
*Nodes : Available : Used*
Node1 - 8GB, 8vcores - 8GB. 8cores
Node2 - 8GB, 8vcores - 8GB. 8cores
Node3 - 8GB, 8vcores - 8GB. 8cores
Queues -> A and B both 50% capacity, 100% max capacity
MultiNode enabled + Preemption enabled
1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores
2. JobB Submitted to B queue with AM size of 1GB
{code}
2020-05-21 12:12:27,313 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest
IP=172.27.160.139 OPERATION=Submit Application Request
TARGET=ClientRMService RESULT=SUCCESS APPID=application_1590046667304_0005
CALLERCONTEXT=CLI QUEUENAME=dummy
{code}
3. Preemption happens and used capacity is lesser than 1.0f
{code}
2020-05-21 12:12:48,222 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics:
Non-AM container preempted, current
appAttemptId=appattempt_1590046667304_0004_000001,
containerId=container_e09_1590046667304_0004_01_000024, resource=<memory:1024,
vCores:1>
{code}
4. JobB gets a Reserved Container as part of
CapacityScheduler#allocateOrReserveNewContainer
{code}
2020-05-21 12:12:48,226 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to
RESERVED
2020-05-21 12:12:48,226 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Reserved container=container_e09_1590046667304_0005_01_000001, on node=host:
tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8
available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with
resource=<memory:1024, vCores:1>
{code}
*Why RegularContainerAllocator reserved the container when the used capacity is
<= 1.0f ?*
{code}
The reason is even though the container is preempted - nodemanager has to stop
the container and heartbeat and update the available and unallocated resources
to ResourceManager.
{code}
5. Now, no new allocation happens and reserved container stays at reserved.
After reservation the used capacity becomes 1.0f, below will be in a loop and
no new allocate or reserve happens. The reserved container cannot be allocated
as reserved node does not have space. node2 has space for 1GB, 1vcore but
CapacityScheduler#allocateOrReserveNewContainers not getting called causing the
Hang.
*[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes ->
CapacityScheduler#allocateContainersOnMultiNodes#allocateFromReservedContainer
-> Node3 has reserved container*
{code}
2020-05-21 12:13:33,242 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Trying to fulfill reservation for application application_1590046667304_0005
on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
2020-05-21 12:13:33,242 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
assignContainers: partition= #applications=1
2020-05-21 12:13:33,242 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Reserved container=container_e09_1590046667304_0005_01_000001, on node=host:
tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8
available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with
resource=<memory:1024, vCores:1>
2020-05-21 12:13:33,243 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Allocation proposal accepted
{code}
CapacityScheduler#allocateOrReserveNewContainers won't be called as below check
in allocateContainersOnMultiNodes fails
{code}
if (getRootQueue().getQueueCapacities().getUsedCapacity(
candidates.getPartition()) >= 1.0f
&& preemptionManager.getKillableResource(
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]