Nishan Shetty created YARN-2283:
-----------------------------------

             Summary: RM failed to release the AM container
                 Key: YARN-2283
                 URL: https://issues.apache.org/jira/browse/YARN-2283
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.4.0
         Environment: NM1: AM running
NM2: Map task running
mapreduce.map.maxattempts=1
            Reporter: Nishan Shetty
            Priority: Critical


During container stability test i faced this problem

While job is running map task got killed

Observe that eventhough application is FAILED MRAppMaster process is running 
till timeout because RM did not release  the AM container

{code}
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1405318134611_0002_01_000005 Container Transitioned from RUNNING to 
COMPLETED
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Completed container: container_1405318134611_0002_01_000005 in state: 
COMPLETED event:FINISHED
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos   
OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1405318134611_0002    
CONTAINERID=container_1405318134611_0002_01_000005
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
 Finish information of container container_1405318134611_0002_01_000005 is 
written
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
Stored the finish data of container container_1405318134611_0002_01_000005
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
 Released container container_1405318134611_0002_01_000005 of capacity 
<memory:1024, vCores:1> on host HOST-10-18-40-153:45026, which currently has 1 
containers, <memory:2048, vCores:1> used and <memory:6144, vCores:7> available, 
release resources=true
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
default used=<memory:2048, vCores:1> numContainers=1 user=testos 
user-resources=<memory:2048, vCores:1>
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
completedContainer container=Container: [ContainerId: 
container_1405318134611_0002_01_000005, NodeId: HOST-10-18-40-153:45026, 
NodeHttpAddress: HOST-10-18-40-153:45025, Resource: <memory:1024, vCores:1>, 
Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
}, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
usedResources=<memory:2048, vCores:1>, usedCapacity=0.25, 
absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=<memory:8192, 
vCores:8>
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
used=<memory:2048, vCores:1> cluster=<memory:8192, vCores:8>
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Re-sorting completed queue: root.default stats: default: capacity=1.0, 
absoluteCapacity=1.0, usedResources=<memory:2048, vCores:1>, usedCapacity=0.25, 
absoluteUsedCapacity=0.25, numApps=1, numContainers=1
2014-07-14 14:43:33,899 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application attempt appattempt_1405318134611_0002_000001 released container 
container_1405318134611_0002_01_000005 on node: host: HOST-10-18-40-153:45026 
#containers=1 available=6144 used=2048 with event: FINISHED
2014-07-14 14:43:34,924 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Updating application attempt appattempt_1405318134611_0002_000001 with final 
state: FINISHING
2014-07-14 14:43:34,924 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1405318134611_0002_000001 State change from RUNNING to FINAL_SAVING
2014-07-14 14:43:34,924 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
application application_1405318134611_0002 with final state: FINISHING
2014-07-14 14:43:34,947 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher 
event type: NodeDataChanged with state:SyncConnected for 
path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_000001
 for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2014-07-14 14:43:34,947 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1405318134611_0002 State change from RUNNING to FINAL_SAVING
2014-07-14 14:43:34,947 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1405318134611_0002
2014-07-14 14:43:34,947 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1405318134611_0002_000001 State change from FINAL_SAVING to FINISHING
2014-07-14 14:43:35,012 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher 
event type: NodeDataChanged with state:SyncConnected for 
path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002 for 
Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in 
state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: 
STARTED
2014-07-14 14:43:35,013 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1405318134611_0002 State change from FINAL_SAVING to FINISHING
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to