Prabhu Joseph created YARN-10259:
------------------------------------
Summary: Reserved Containers not allocated from available space of
other nodes in CandidateNodeSet in MultiNodePlacement
Key: YARN-10259
URL: https://issues.apache.org/jira/browse/YARN-10259
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 3.2.0, 3.3.0
Reporter: Prabhu Joseph
Reserved Containers are not allocated from the available space of other nodes
in CandidateNodeSet in MultiNodePlacement.
*Repro:*
1. MultiNode Placement Enabled.
2. Two nodes h1 and h2 with 8GB
3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets
placed in h2.
4. Submit app3 AM which is reserved in h1
5. Kill app2 which frees space in h2.
6. app3 AM never gets ALLOCATED
RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on h2
as it expects the assignment to be on same node where reservation has happened.
{code}
2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler]
scheduler.SchedulerApplicationAttempt
(SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt
appattempt_1588684773609_0003_000001 reserved container
container_1588684773609_0003_01_000001 on node host: h1:1234 #containers=1
available=<memory:3072, vCores:7> used=<memory:5120, vCores:1>. This attempt
currently has 1 reserved containers at priority 0; currentReservation
<memory:5120, vCores:1>
2020-05-05 18:49:37,264 INFO [AsyncDispatcher event handler]
fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved
container=container_1588684773609_0003_01_000001, on node=host: h1:1234
#containers=1 available=<memory:3072, vCores:7> used=<memory:5120, vCores:1>
with resource=<memory:5120, vCores:1>
RESERVED=[(Application=appattempt_1588684773609_0003_000001;
Node=h1:1234; Resource=<memory:5120, vCores:1>)]
2020-05-05 18:49:38,283 DEBUG [Time-limited test]
allocator.RegularContainerAllocator
(RegularContainerAllocator.java:assignContainer(514)) - assignContainers:
node=h2 application=application_1588684773609_0003 priority=0
pendingAsk=<per-allocation-resource=<memory:5120, vCores:1>,repeat=1>
type=OFF_SWITCH
2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp
(FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate
from reserved container container_1588684773609_0003_01_000001, but node is not
reserved
ALLOCATED=[(Application=appattempt_1588684773609_0003_000001;
Node=h2:1234; Resource=<memory:5120, vCores:1>)]
{code}
After reverting fix of YARN-8127, it works. Attached testcase which reproduces
the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]