Zhijie Shen created YARN-389:
--------------------------------
Summary: Infinitely assigning containers when the required
resource exceeds the cluster's absolute capacity
Key: YARN-389
URL: https://issues.apache.org/jira/browse/YARN-389
Project: Hadoop YARN
Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
I've run wordcount example on branch-2 and trunk. I've set
yarn.nodemanager.resource.memory-mb to 1G and yarn.app.mapreduce.am.resource.mb
to 1.5G. Therefore, resourcemanager is to assign a 2G AM container for AM.
However, the nodemanager doesn't have enough memory to assign the container.
The problem is that the assignment operation will be repeated infinitely, if
the assignment cannot be accomplished. See the following log.
{code}
2013-02-07 19:05:05,947 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new
applicationId: 1
2013-02-07 19:05:06,477 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Storing
Application with id application_1360292699925_0001
2013-02-07 19:05:06,479 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing
info for app: application_1360292699925_0001
2013-02-07 19:05:06,479 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with
id 1 submitted by user zshen
2013-02-07 19:05:06,481 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=zshen
IP=127.0.0.1 OPERATION=Submit Application Request TARGET=ClientRMService
RESULT=SUCCESS APPID=application_1360292699925_0001
2013-02-07 19:05:06,493 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1360292699925_0001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,494 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering appattempt_1360292699925_0001_000001
2013-02-07 19:05:06,495 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1360292699925_0001_000001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,506 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application application_1360292699925_0001 from user: zshen activated in queue:
default
2013-02-07 19:05:06,506 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application added - appId: application_1360292699925_0001 user:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@4965d0e0,
leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1
#queue-pending-applications: 0 #queue-active-applications: 1
2013-02-07 19:05:06,506 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Application added - appId: application_1360292699925_0001 user: zshen
leaf-queue of parent: root #applications: 1
2013-02-07 19:05:06,506 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application Submission: appattempt_1360292699925_0001_000001, user: zshen
queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0,
vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1,
numContainers=0, currently active: 1
2013-02-07 19:05:06,508 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1360292699925_0001_000001 State change from SUBMITTED to SCHEDULED
2013-02-07 19:05:06,509 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1360292699925_0001 State change from SUBMITTED to ACCEPTED
2013-02-07 19:05:07,163 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:08,164 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:09,167 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:10,168 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:11,170 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:12,173 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:13,175 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:14,177 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 19:05:15,179 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
...
2013-02-07 23:51:02,976 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 23:51:03,977 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 23:51:04,978 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 23:51:05,979 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 23:51:06,981 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 23:51:07,982 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
2013-02-07 23:51:08,983 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024,
vCores:16> currentCapacity 0.0 required <memory:2048, vCores:1>
potentialNewCapacity: 2.0 ( max-capacity: 1.0)
...
{code}
In my opinion, the attempt of assigning containers should be terminated in the
following two cases.
1. Required > Cluster's absolute capacity: the assignment is impossible to be
accomplished. The assignment should be failed immediately.
2. Required + Already used > Cluster's absolute capacity: the assignment should
be failed after a certain number of rounds of assignment attempt or a certain
duration. The number of rounds or the duration length should be configurable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira