Zhijie Shen created YARN-389:
--------------------------------

             Summary: Infinitely assigning containers when the required 
resource exceeds the cluster's absolute capacity
                 Key: YARN-389
                 URL: https://issues.apache.org/jira/browse/YARN-389
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Zhijie Shen
            Assignee: Zhijie Shen


I've run wordcount example on branch-2 and trunk. I've set 
yarn.nodemanager.resource.memory-mb to 1G and yarn.app.mapreduce.am.resource.mb 
to 1.5G. Therefore, resourcemanager is to assign a 2G AM container for AM. 
However, the nodemanager doesn't have enough memory to assign the container. 
The problem is that the assignment operation will be repeated infinitely, if 
the assignment cannot be accomplished. See the following log.

{code}
2013-02-07 19:05:05,947 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
applicationId: 1
2013-02-07 19:05:06,477 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Storing 
Application with id application_1360292699925_0001
2013-02-07 19:05:06,479 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1360292699925_0001
2013-02-07 19:05:06,479 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with 
id 1 submitted by user zshen
2013-02-07 19:05:06,481 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=zshen    
IP=127.0.0.1    OPERATION=Submit Application Request    TARGET=ClientRMService  
RESULT=SUCCESS  APPID=application_1360292699925_0001
2013-02-07 19:05:06,493 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1360292699925_0001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,494 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering appattempt_1360292699925_0001_000001
2013-02-07 19:05:06,495 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1360292699925_0001_000001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,506 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Application application_1360292699925_0001 from user: zshen activated in queue: 
default
2013-02-07 19:05:06,506 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Application added - appId: application_1360292699925_0001 user: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@4965d0e0,
 leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 
#queue-pending-applications: 0 #queue-active-applications: 1
2013-02-07 19:05:06,506 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Application added - appId: application_1360292699925_0001 user: zshen 
leaf-queue of parent: root #applications: 1
2013-02-07 19:05:06,506 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Submission: appattempt_1360292699925_0001_000001, user: zshen 
queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, 
vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, 
numContainers=0, currently active: 1
2013-02-07 19:05:06,508 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1360292699925_0001_000001 State change from SUBMITTED to SCHEDULED
2013-02-07 19:05:06,509 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1360292699925_0001 State change from SUBMITTED to ACCEPTED
2013-02-07 19:05:07,163 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:08,164 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:09,167 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:10,168 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:11,170 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:12,173 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:13,175 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:14,177 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 19:05:15,179 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
...
2013-02-07 23:51:02,976 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:03,977 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:04,978 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:05,979 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:06,981 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:07,982 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
2013-02-07 23:51:08,983 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, 
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1> 
potentialNewCapacity: 2.0 (  max-capacity: 1.0)
...
{code}

In my opinion, the attempt of assigning containers should be terminated in the 
following two cases.
1. Required > Cluster's absolute capacity: the assignment is impossible to be 
accomplished. The assignment should be failed immediately.
2. Required + Already used > Cluster's absolute capacity: the assignment should 
be failed after a certain number of rounds of assignment attempt or a certain 
duration. The number of rounds or the duration length should be configurable.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to