subject:"\[jira\] \[Commented\] \(YARN\-2283\) RM failed to release the AM container"

[jira] [Commented] (YARN-2283) RM failed to release the AM container

2014-07-31 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081170#comment-14081170
 ] 

Sunil G commented on YARN-2283:
---

Thank you [~jlowe]. Yes, I have taken the thread dump and could see 
ThreadPoolExecutor is still there. I have applied patch and verified the same, 
it is not creating the same problem. Thank you.

> RM failed to release the AM container
> -
>
> Key: YARN-2283
> URL: https://issues.apache.org/jira/browse/YARN-2283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
> Environment: NM1: AM running
> NM2: Map task running
> mapreduce.map.maxattempts=1
>Reporter: Nishan Shetty
>Priority: Critical
>
> During container stability test i faced this problem
> While job is running map task got killed
> Observe that eventhough application is FAILED MRAppMaster process is running 
> till timeout because RM did not release  the AM container
> {code}
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
> COMPLETED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Completed container: container_1405318134611_0002_01_05 in state: 
> COMPLETED event:FINISHED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1405318134611_0002
> CONTAINERID=container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Finish information of container container_1405318134611_0002_01_05 is 
> written
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
> Stored the finish data of container container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
>  Released container container_1405318134611_0002_01_05 of capacity 
>  on host HOST-10-18-40-153:45026, which currently has 
> 1 containers,  used and  
> available, release resources=true
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> default used= numContainers=1 user=testos 
> user-resources=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
> NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , 
> Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
> }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.25, 
> absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8>
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
> used= cluster=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.default stats: default: capacity=1.0, 
> absoluteCapacity=1.0, usedResources=, 
> usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1405318134611_0002_01 released container 
> container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
> #containers=1 available=6144 used=2048 with event: FINISHED
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Updating application attempt appattempt_1405318134611_0002_01 with final 
> state: FINISHING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
> application application_1405318134611_0002 with final state: FINISHING
> 2014-07-14 14:43:34,947 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: NodeDataChanged with state:SyncConnected for 
> path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/app

[jira] [Commented] (YARN-2283) RM failed to release the AM container

2014-07-31 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080804#comment-14080804
 ] 

Sunil G commented on YARN-2283:
---

Seems to be duplicate to MAPREDUCE-5888 
[~jlowe] cud u pls confirm whether its the same issue.

> RM failed to release the AM container
> -
>
> Key: YARN-2283
> URL: https://issues.apache.org/jira/browse/YARN-2283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
> Environment: NM1: AM running
> NM2: Map task running
> mapreduce.map.maxattempts=1
>Reporter: Nishan Shetty
>Priority: Critical
>
> During container stability test i faced this problem
> While job is running map task got killed
> Observe that eventhough application is FAILED MRAppMaster process is running 
> till timeout because RM did not release  the AM container
> {code}
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
> COMPLETED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Completed container: container_1405318134611_0002_01_05 in state: 
> COMPLETED event:FINISHED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1405318134611_0002
> CONTAINERID=container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Finish information of container container_1405318134611_0002_01_05 is 
> written
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
> Stored the finish data of container container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
>  Released container container_1405318134611_0002_01_05 of capacity 
>  on host HOST-10-18-40-153:45026, which currently has 
> 1 containers,  used and  
> available, release resources=true
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> default used= numContainers=1 user=testos 
> user-resources=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
> NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , 
> Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
> }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.25, 
> absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8>
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
> used= cluster=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.default stats: default: capacity=1.0, 
> absoluteCapacity=1.0, usedResources=, 
> usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1405318134611_0002_01 released container 
> container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
> #containers=1 available=6144 used=2048 with event: FINISHED
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Updating application attempt appattempt_1405318134611_0002_01 with final 
> state: FINISHING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
> application application_1405318134611_0002 with final state: FINISHING
> 2014-07-14 14:43:34,947 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: NodeDataChanged with state:SyncConnected for 
> path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01
>  for Service 
> org.apache.hadoop.yarn.server.resourcemanager.rec

[jira] [Commented] (YARN-2283) RM failed to release the AM container

2014-07-30 Thread Nishan Shetty (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080480#comment-14080480
 ] 

Nishan Shetty commented on YARN-2283:
-

I checked this issue, it is not coming in trunk.
This issue is reproducible in 2.4.*

> RM failed to release the AM container
> -
>
> Key: YARN-2283
> URL: https://issues.apache.org/jira/browse/YARN-2283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
> Environment: NM1: AM running
> NM2: Map task running
> mapreduce.map.maxattempts=1
>Reporter: Nishan Shetty
>Priority: Critical
>
> During container stability test i faced this problem
> While job is running map task got killed
> Observe that eventhough application is FAILED MRAppMaster process is running 
> till timeout because RM did not release  the AM container
> {code}
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
> COMPLETED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Completed container: container_1405318134611_0002_01_05 in state: 
> COMPLETED event:FINISHED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1405318134611_0002
> CONTAINERID=container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Finish information of container container_1405318134611_0002_01_05 is 
> written
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
> Stored the finish data of container container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
>  Released container container_1405318134611_0002_01_05 of capacity 
>  on host HOST-10-18-40-153:45026, which currently has 
> 1 containers,  used and  
> available, release resources=true
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> default used= numContainers=1 user=testos 
> user-resources=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
> NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , 
> Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
> }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.25, 
> absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8>
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
> used= cluster=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.default stats: default: capacity=1.0, 
> absoluteCapacity=1.0, usedResources=, 
> usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1405318134611_0002_01 released container 
> container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
> #containers=1 available=6144 used=2048 with event: FINISHED
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Updating application attempt appattempt_1405318134611_0002_01 with final 
> state: FINISHING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
> application application_1405318134611_0002 with final state: FINISHING
> 2014-07-14 14:43:34,947 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: NodeDataChanged with state:SyncConnected for 
> path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01
>  for Service 
> org.apache.hadoop.yarn.server.resourcemanager.re

[jira] [Commented] (YARN-2283) RM failed to release the AM container

2014-07-30 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079234#comment-14079234
 ] 

Sunil G commented on YARN-2283:
---

I tried to reproduce this and I found AM memory is immediately released.
Could you please try to recur this and give the exact steps?

> RM failed to release the AM container
> -
>
> Key: YARN-2283
> URL: https://issues.apache.org/jira/browse/YARN-2283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
> Environment: NM1: AM running
> NM2: Map task running
> mapreduce.map.maxattempts=1
>Reporter: Nishan Shetty
>Priority: Critical
>
> During container stability test i faced this problem
> While job is running map task got killed
> Observe that eventhough application is FAILED MRAppMaster process is running 
> till timeout because RM did not release  the AM container
> {code}
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1405318134611_0002_01_05 Container Transitioned from RUNNING to 
> COMPLETED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Completed container: container_1405318134611_0002_01_05 in state: 
> COMPLETED event:FINISHED
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos 
> OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1405318134611_0002
> CONTAINERID=container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Finish information of container container_1405318134611_0002_01_05 is 
> written
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: 
> Stored the finish data of container container_1405318134611_0002_01_05
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
>  Released container container_1405318134611_0002_01_05 of capacity 
>  on host HOST-10-18-40-153:45026, which currently has 
> 1 containers,  used and  
> available, release resources=true
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> default used= numContainers=1 user=testos 
> user-resources=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, 
> NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , 
> Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 
> }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.25, 
> absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8>
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 
> used= cluster=
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.default stats: default: capacity=1.0, 
> absoluteCapacity=1.0, usedResources=, 
> usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
> 2014-07-14 14:43:33,899 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1405318134611_0002_01 released container 
> container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 
> #containers=1 available=6144 used=2048 with event: FINISHED
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Updating application attempt appattempt_1405318134611_0002_01 with final 
> state: FINISHING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING
> 2014-07-14 14:43:34,924 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
> application application_1405318134611_0002 with final state: FINISHING
> 2014-07-14 14:43:34,947 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: NodeDataChanged with state:SyncConnected for 
> path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01
>  for Service 
> org.apache.

[jira] [Commented] (YARN-2283) RM failed to release the AM container

[jira] [Commented] (YARN-2283) RM failed to release the AM container

[jira] [Commented] (YARN-2283) RM failed to release the AM container

[jira] [Commented] (YARN-2283) RM failed to release the AM container

4 matches

Site Navigation

Mail list logo

Footer information