Charan Hebri created YARN-8138:
----------------------------------
Summary: No containers pre-empted from another queue when using
node labels
Key: YARN-8138
URL: https://issues.apache.org/jira/browse/YARN-8138
Project: Hadoop YARN
Issue Type: Bug
Reporter: Charan Hebri
There seems to be an issue with pre-emption when using node labels with queue
priority.
Test configuration:
queue A (capacity=50, priority=1)
queue B (capacity=50, priority=2)
both have accessible-node-labels set to x
A.accessible-node-labels.x.capacity = 50
B.accessible-node-labels.x.capacity = 50
Along with this pre-emption related properties have been set.
Test steps:
- Set NM memory = 6000MB and containerMemory = 750MB
- Submit an application A1 to B, with am-container = container =
(6000-750-1500), no. of containers = 2
- Submit an application A2 to A, with am-container = 750, container = 1500, no
of containers = (NUM_NM-1)
- Kill application A1
- Submit an application A3 to B with am-container=container=5000, no. of
containers=3
- Expectation is that containers are pre-empted from application A2 to A3 but
there is no container pre-emption happening
Container pre-emption is stuck with the message in the RM log,
{noformat}
2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler
(CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler
(CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to
fulfill reservation for application application_1517571510094_0003 on node:
XXXXXXXXXX:25454
2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) -
Reserved container application=application_1517571510094_0003
resource=<memory:3072, vCores:1>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
cluster=<memory:18000, vCores:3>
2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler
(CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler
(CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to
fulfill reservation for application application_1517571510094_0003 on node:
XXXXXXXXXX:25454
2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) -
Reserved container application=application_1517571510094_0003
resource=<memory:3072, vCores:1>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
cluster=<memory:18000, vCores:3>
2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler
(CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler
(CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to
fulfill reservation for application application_1517571510094_0003 on node:
XXXXXXXXXX:25454
2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) -
Reserved container application=application_1517571510094_0003
resource=<memory:3072, vCores:1>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
cluster=<memory:18000, vCores:3>{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]