Aidi Pi created YARN-6976:
-----------------------------

             Summary: Some containers take a long time in KILLING state after 
the application is finished.
                 Key: YARN-6976
                 URL: https://issues.apache.org/jira/browse/YARN-6976
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager, resourcemanager
    Affects Versions: 2.7.3
         Environment: OS: Ubuntu 16.04, Java: JDK1.8, Docker: 
seqenceid/hadoop-2.4.0
            Reporter: Aidi Pi


I use Docker as the container of YARN and ran Spark applications. In some runs, 
the resource manager log indicates that the application is done. However, some 
nodemanager logs indicates that the containers on this node are still in 
RUNNING state then enter KILLING state. They spend a long time (about 20s) in 
KILLING state before terminated.

In this case, 3 containers were still running after the app entered FINISHED 
state.
Below is the tail of RM and NM logs:

{panel:title=RM log}
2017-08-08 15:11:34,009 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1502226348464_0002 State change from FINISHING to FINISHED
{panel}


{panel:title=NM log}
2017-08-08 15:11:51,277 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1502226348464_0002_01_000003 transitioned from KILLING to 
EXITED_WITH_SUCCESS
2017-08-08 15:11:51,277 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1502226348464_0002_01_000010 transitioned from KILLING to 
EXITED_WITH_SUCCESS
2017-08-08 15:11:51,277 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1502226348464_0002_01_000016 transitioned from KILLING to 
EXITED_WITH_FAILURE
2017-08-08 15:11:51,309 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=eddie        
OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    
RESULT=SUCCESS  APPID=application_1502226348464_0002    
CONTAINERID=container_1502226348464_0002_01_000003
2017-08-08 15:11:51,351 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1502226348464_0002_01_000003 transitioned from 
EXITED_WITH_SUCCESS to DONE
2017-08-08 15:11:51,351 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=eddie        
OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    
RESULT=SUCCESS  APPID=application_1502226348464_0002    
CONTAINERID=container_1502226348464_0002_01_000010
2017-08-08 15:11:51,351 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1502226348464_0002_01_000010 transitioned from 
EXITED_WITH_SUCCESS to DONE
2017-08-08 15:11:51,357 WARN 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=eddie        
OPERATION=Container Finished - Failed   TARGET=ContainerImpl    RESULT=FAILURE  
DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE    
APPID=application_1502226348464_0002    
CONTAINERID=container_1502226348464_0002_01_000016
2017-08-08 15:11:51,357 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1502226348464_0002_01_000016 transitioned from 
EXITED_WITH_FAILURE to DONE
{panel}








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to