Eric Yang created YARN-9486:
-------------------------------

             Summary: Docker container exited with failure does not get clean 
up correctly
                 Key: YARN-9486
                 URL: https://issues.apache.org/jira/browse/YARN-9486
             Project: Hadoop YARN
          Issue Type: Sub-task
            Reporter: Eric Yang
            Assignee: Eric Yang


When docker container encounters error and exit prematurely 
(EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
get messages that look like this:

{code}
java.io.IOException: Could not find 
nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_000007//container_1555111445937_0008_01_000007.pid
 in any of the directories
2019-04-15 20:42:16,454 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1555111445937_0008_01_000007 transitioned from RELAUNCHING 
to EXITED_WITH_FAILURE
2019-04-15 20:42:16,455 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
 Cleaning up container container_1555111445937_0008_01_000007
2019-04-15 20:42:16,455 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
 Container container_1555111445937_0008_01_000007 not launched. No cleanup 
needed to be done
2019-04-15 20:42:16,455 WARN 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase        
OPERATION=Container Finished - Failed   TARGET=ContainerImpl    RESULT=FAILURE  
DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE    
APPID=application_1555111445937_0008    
CONTAINERID=container_1555111445937_0008_01_000007
2019-04-15 20:42:16,458 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1555111445937_0008_01_000007 transitioned from 
EXITED_WITH_FAILURE to DONE
2019-04-15 20:42:16,458 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Removing container_1555111445937_0008_01_000007 from application 
application_1555111445937_0008
2019-04-15 20:42:16,458 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Stopping resource-monitoring for container_1555111445937_0008_01_000007
2019-04-15 20:42:16,458 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Considering container container_1555111445937_0008_01_000007 for 
log-aggregation
2019-04-15 20:42:16,804 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Getting container-status for container_1555111445937_0008_01_000007
2019-04-15 20:42:16,804 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Getting localization status for container_1555111445937_0008_01_000007
2019-04-15 20:42:16,804 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Returning ContainerStatus: [ContainerId: 
container_1555111445937_0008_01_000007, ExecutionType: GUARANTEED, State: 
COMPLETE, Capability: <memory:1024, vCores:1>, Diagnostics: ..., ExitStatus: 
-1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
2019-04-15 20:42:18,464 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
completed containers from NM context: [container_1555111445937_0008_01_000007]
2019-04-15 20:43:50,476 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Stopping container with container Id: container_1555111445937_0008_01_000007
{code}

There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to