Brandon Scheller created YARN-9449:
--------------------------------------

             Summary: Non-exclusive labels can create reservation loop on 
cluster without unlabeled node
                 Key: YARN-9449
                 URL: https://issues.apache.org/jira/browse/YARN-9449
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.8.5
            Reporter: Brandon Scheller


https://issues.apache.org/jira/browse/YARN-5342 Added a counter to Yarn so that 
unscheduled resource requests were attempted to be scheduled on unlabeled nodes 
first.
This counter is reset only when an attempt to schedule happens on an unlabeled 
node.

On hadoop clusters with only labeled nodes, this counter can never be reset and 
therefore it will block skipping that node.
Because the node will not be skipped, it creates the loop shown in the Yarn RM 
logs

 

{{_2019-02-18 23:54:22,591 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
(ResourceManager Event Processor): container_1550533628872_0003_01_000023 
Container Transitioned from NEW to RESERVED 2019-02-18 23:54:22,591 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator
 (ResourceManager Event Processor): Reserved container 
application=application_1550533628872_0003 resource=<memory:11264, vCores:1> 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3
 cluster=<memory:24576, vCores:16> 2019-02-18 23:54:22,592 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue 
(ResourceManager Event Processor): assignedContainer queue=root 
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> 
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,592 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Trying to fulfill reservation for 
application application_1550533628872_0003 on node: 
ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:23,592 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
 (ResourceManager Event Processor): Application application_1550533628872_0003 
unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1 
available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has 
0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED 
2019-02-18 23:54:23,593 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
(ResourceManager Event Processor): container_1550533628872_0003_01_000024 
Container Transitioned from NEW to RESERVED 2019-02-18 23:54:23,593 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator
 (ResourceManager Event Processor): Reserved container 
application=application_1550533628872_0003 resource=<memory:11264, vCores:1> 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3
 cluster=<memory:24576, vCores:16> 2019-02-18 23:54:23,593 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue 
(ResourceManager Event Processor): assignedContainer queue=root 
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> 
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,593 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Trying to fulfill reservation for 
application application_1550533628872_0003 on node: 
ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:24,593 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
 (ResourceManager Event Processor): Application application_1550533628872_0003 
unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1 
available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has 
0 at priority 1; currentReservation <memory:0, vCores:0> on node-label=LABELED 
2019-02-18 23:54:24,594 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
(ResourceManager Event Processor): container_1550533628872_0003_01_000025 
Container Transitioned from NEW to RESERVED 2019-02-18 23:54:24,594 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator
 (ResourceManager Event Processor): Reserved container 
application=application_1550533628872_0003 resource=<memory:11264, vCores:1> 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6ffe0dc3
 cluster=<memory:24576, vCores:16> 2019-02-18 23:54:24,594 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue 
(ResourceManager Event Processor): assignedContainer queue=root 
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> 
cluster=<memory:24576, vCores:16> 2019-02-18 23:54:25,594 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Trying to fulfill reservation for 
application application_1550533628872_0003 on node: 
ip-10-0-0-122.ec2.internal:8041 2019-02-18 23:54:25,595 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
 (ResourceManager Event Processor): Application application_1550533628872_0003 
unreserved on node host: ip-10-0-0-122.ec2.internal:8041 #containers=1 
available=<memory:1024, vCores:7> used=<memory:11264, vCores:1>, currently has 
0 at priority 1; currentReservation <memory:0, vCores:0> on 
node-label=LABELED_}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to