[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826580#comment-16826580
 ] 

Hudson commented on YARN-9486:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16466 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16466/])
YARN-9486. Docker container exited with failure does not get clean up (ebadger: 
rev 79d3d35398cb5348cfd62e41e3318ec7a337421a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerCleanup.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerCleanup.java


> Docker container exited with failure does not get clean up correctly
> --------------------------------------------------------------------
>
>                 Key: YARN-9486
>                 URL: https://issues.apache.org/jira/browse/YARN-9486
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 3.2.0
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_000007//container_1555111445937_0008_01_000007.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_000007 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_000007
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_000007 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase      
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl    
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008    
> CONTAINERID=container_1555111445937_0008_01_000007
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_000007 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_000007 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_000007
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_000007 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_000007
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_000007
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_000007, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: <memory:1024, vCores:1>, Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_000007]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: container_1555111445937_0008_01_000007
> {code}
> There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to