[jira] [Updated] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2283: Affects Version/s: (was: 2.5.0) 2.4.0 RM failed to release the AM container - Key: YARN-2283 URL: https://issues.apache.org/jira/browse/YARN-2283 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Environment: NM1: AM running NM2: Map task running mapreduce.map.maxattempts=1 Reporter: Nishan Shetty Priority: Critical During container stability test i faced this problem While job is running map task got killed Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container {code} 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_05 Container Transitioned from RUNNING to COMPLETED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_05 in state: COMPLETED event:FINISHED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1405318134611_0002 CONTAINERID=container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_05 is written 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_05 of capacity memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 available, release resources=true 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=memory:2048, vCores:1 numContainers=1 user=testos user-resources=memory:2048, vCores:1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_01 released container container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_01 with final state: FINISHING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01
[jira] [Updated] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2283: Affects Version/s: (was: 2.4.0) 2.5.0 RM failed to release the AM container - Key: YARN-2283 URL: https://issues.apache.org/jira/browse/YARN-2283 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Environment: NM1: AM running NM2: Map task running mapreduce.map.maxattempts=1 Reporter: Nishan Shetty Priority: Critical During container stability test i faced this problem While job is running map task got killed Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container {code} 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_05 Container Transitioned from RUNNING to COMPLETED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_05 in state: COMPLETED event:FINISHED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1405318134611_0002 CONTAINERID=container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_05 is written 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_05 of capacity memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 available, release resources=true 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=memory:2048, vCores:1 numContainers=1 user=testos user-resources=memory:2048, vCores:1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_01 released container container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_01 with final state: FINISHING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01