Sandy Ryza created YARN-1110:
--------------------------------

             Summary: NodeManager doesn't complete container after transition 
from LOCALIZED to KILLING
                 Key: YARN-1110
                 URL: https://issues.apache.org/jira/browse/YARN-1110
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 2.1.0-beta
            Reporter: Sandy Ryza


Multiple containers are sticking around on an NM, taking up resources, after 
they have been killed.

{code}
2013-08-27 15:56:36,597 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1377559361179_0018_01_001337 by user llama
2013-08-27 15:56:36,597 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=llama        
IP=10.20.191.233        OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1377559361179_0018    
CONTAINERID=container_1377559361179_0018_01_001337
2013-08-27 15:56:36,598 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1377559361179_0018_01_001337 to application 
application_1377559361179_0018
2013-08-27 15:56:36,598 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1377559361179_0018_01_001337 transitioned from NEW to 
LOCALIZED
2013-08-27 15:56:36,613 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Stopping container with container Id: container_1377559361179_0018_01_001337
2013-08-27 15:56:36,616 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=llama        
IP=10.20.191.233        OPERATION=Stop Container Request        
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1377559361179_0018    
CONTAINERID=container_1377559361179_0018_01_001337
2013-08-27 15:56:36,616 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1377559361179_0018_01_001337 transitioned from LOCALIZED to 
KILLING
2013-08-27 15:56:36,616 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1377559361179_0018_01_001337
2013-08-27 15:56:36,616 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Container container_1377559361179_0018_01_001337 not launched. No cleanup 
needed to be done
2013-08-27 15:56:36,617 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
status for container: container_id {, app_attempt_id {, application_id {, id: 
18, cluster_timestamp: 1377559361179, }, attemptId: 1, }, id: 402, }, state: 
C_RUNNING, diagnostics: "", exit_status: -1000, 
{code}

This is the last time the container is mentioned in the logs.  We never get a 
{code}
2013-08-27 15:56:38,832 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
completed container <containerID>
{code}
like we do for other completed containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to