[jira] [Updated] (YARN-2283) RM failed to release the AM container

2014-08-25 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2283:


Affects Version/s: (was: 2.5.0)
   2.4.0

 RM failed to release the AM container
 -

 Key: YARN-2283
 URL: https://issues.apache.org/jira/browse/YARN-2283
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
 Environment: NM1: AM running
 NM2: Map task running
 mapreduce.map.maxattempts=1
Reporter: Nishan Shetty
Priority: Critical

 During container stability test i faced this problem
 While job is running map task got killed
 Observe that eventhough application is FAILED MRAppMaster process is running 
 till timeout because RM did not release  the AM container
 {code}
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
 COMPLETED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
  Completed container: container_1405318134611_0002_01_05 in state: 
 COMPLETED event:FINISHED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
 APPID=application_1405318134611_0002
 CONTAINERID=container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  Finish information of container container_1405318134611_0002_01_05 is 
 written
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
 Stored the finish data of container container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
  Released container container_1405318134611_0002_01_05 of capacity 
 memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 
 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 
 available, release resources=true
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 default used=memory:2048, vCores:1 numContainers=1 user=testos 
 user-resources=memory:2048, vCores:1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 completedContainer container=Container: [ContainerId: 
 container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
 NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, 
 Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:2048, vCores:1, usedCapacity=0.25, 
 absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, 
 vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting completed queue: root.default stats: default: capacity=1.0, 
 absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, 
 usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application attempt appattempt_1405318134611_0002_01 released container 
 container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
 #containers=1 available=6144 used=2048 with event: FINISHED
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Updating application attempt appattempt_1405318134611_0002_01 with final 
 state: FINISHING
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
 application application_1405318134611_0002 with final state: FINISHING
 2014-07-14 14:43:34,947 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: NodeDataChanged with state:SyncConnected for 
 path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01
  

[jira] [Updated] (YARN-2283) RM failed to release the AM container

2014-08-24 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2283:


Affects Version/s: (was: 2.4.0)
   2.5.0

 RM failed to release the AM container
 -

 Key: YARN-2283
 URL: https://issues.apache.org/jira/browse/YARN-2283
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
 Environment: NM1: AM running
 NM2: Map task running
 mapreduce.map.maxattempts=1
Reporter: Nishan Shetty
Priority: Critical

 During container stability test i faced this problem
 While job is running map task got killed
 Observe that eventhough application is FAILED MRAppMaster process is running 
 till timeout because RM did not release  the AM container
 {code}
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
 COMPLETED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
  Completed container: container_1405318134611_0002_01_05 in state: 
 COMPLETED event:FINISHED
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
 APPID=application_1405318134611_0002
 CONTAINERID=container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  Finish information of container container_1405318134611_0002_01_05 is 
 written
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
 Stored the finish data of container container_1405318134611_0002_01_05
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
  Released container container_1405318134611_0002_01_05 of capacity 
 memory:1024, vCores:1 on host HOST-10-18-40-153:45026, which currently has 
 1 containers, memory:2048, vCores:1 used and memory:6144, vCores:7 
 available, release resources=true
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 default used=memory:2048, vCores:1 numContainers=1 user=testos 
 user-resources=memory:2048, vCores:1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 completedContainer container=Container: [ContainerId: 
 container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
 NodeHttpAddress: HOST-10-18-40-153:45025, Resource: memory:1024, vCores:1, 
 Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:2048, vCores:1, usedCapacity=0.25, 
 absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=memory:8192, 
 vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
 used=memory:2048, vCores:1 cluster=memory:8192, vCores:8
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting completed queue: root.default stats: default: capacity=1.0, 
 absoluteCapacity=1.0, usedResources=memory:2048, vCores:1, 
 usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
 2014-07-14 14:43:33,899 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application attempt appattempt_1405318134611_0002_01 released container 
 container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
 #containers=1 available=6144 used=2048 with event: FINISHED
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Updating application attempt appattempt_1405318134611_0002_01 with final 
 state: FINISHING
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
 2014-07-14 14:43:34,924 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
 application application_1405318134611_0002 with final state: FINISHING
 2014-07-14 14:43:34,947 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: NodeDataChanged with state:SyncConnected for 
 path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01