Brandon Scheller created YARN-9449:
--------------------------------------
Summary: Non-exclusive labels can create reservation loop on
cluster without unlabeled node
Key: YARN-9449
URL: https://issues.apache.org/jira/browse/YARN-9449
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.8.5
Reporter: Brandon Scheller
https://issues.apache.org/jira/browse/YARN-5342 Added a counter to Yarn so that
unscheduled resource requests were attempted to be scheduled on unlabeled nodes
first.
This counter is reset only when an attempt to schedule happens on an unlabeled
node.
On hadoop clusters with only labeled nodes, this counter can never be reset and
therefore it will block skipping that node.
Because the node will not be skipped, it creates the loop shown in the Yarn RM
logs
{{_2019-02-18 23:54:22,591 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(ResourceManager Event Processor): container_1550533628872_0003_01_000023
Container Transitioned from NEW to RESERVED 2019-02-18 23:54:22,591 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator
(ResourceManager Event Processor): Reserved container
application=application_1550533628872_0003 resource=<memory:11264, vCores:1>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:22,592 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue
(ResourceManager Event Processor): assignedContainer queue=root
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0>
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,592 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for
application application_1550533628872_0003 on node:
ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:23,592 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
(ResourceManager Event Processor): Application application_1550533628872_0003
unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1
available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has
0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED
2019-02-18 23:54:23,593 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(ResourceManager Event Processor): container_1550533628872_0003_01_000024
Container Transitioned from NEW to RESERVED 2019-02-18 23:54:23,593 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator
(ResourceManager Event Processor): Reserved container
application=application_1550533628872_0003 resource=<memory:11264, vCores:1>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,593 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue
(ResourceManager Event Processor): assignedContainer queue=root
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0>
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,593 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for
application application_1550533628872_0003 on node:
ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:24,593 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
(ResourceManager Event Processor): Application application_1550533628872_0003
unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1
available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has
0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED
2019-02-18 23:54:24,594 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
(ResourceManager Event Processor): container_1550533628872_0003_01_000025
Container Transitioned from NEW to RESERVED 2019-02-18 23:54:24,594 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator
(ResourceManager Event Processor): Reserved container
application=application_1550533628872_0003 resource=<memory:11264, vCores:1>
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,594 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue
(ResourceManager Event Processor): assignedContainer queue=root
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0>
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:25,594 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for
application application_1550533628872_0003 on node:
ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:25,595 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
(ResourceManager Event Processor): Application application_1550533628872_0003
unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1
available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has
0 at priority 1; currentReservation <memory:0, vCores:0> on
node-label=LABELED_}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]