Aidi Pi created YARN-6976: ----------------------------- Summary: Some containers take a long time in KILLING state after the application is finished. Key: YARN-6976 URL: https://issues.apache.org/jira/browse/YARN-6976 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.7.3 Environment: OS: Ubuntu 16.04, Java: JDK1.8, Docker: seqenceid/hadoop-2.4.0 Reporter: Aidi Pi
I use Docker as the container of YARN and ran Spark applications. In some runs, the resource manager log indicates that the application is done. However, some nodemanager logs indicates that the containers on this node are still in RUNNING state then enter KILLING state. They spend a long time (about 20s) in KILLING state before terminated. In this case, 3 containers were still running after the app entered FINISHED state. Below is the tail of RM and NM logs: {panel:title=RM log} 2017-08-08 15:11:34,009 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1502226348464_0002 State change from FINISHING to FINISHED {panel} {panel:title=NM log} 2017-08-08 15:11:51,277 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1502226348464_0002_01_000003 transitioned from KILLING to EXITED_WITH_SUCCESS 2017-08-08 15:11:51,277 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1502226348464_0002_01_000010 transitioned from KILLING to EXITED_WITH_SUCCESS 2017-08-08 15:11:51,277 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1502226348464_0002_01_000016 transitioned from KILLING to EXITED_WITH_FAILURE 2017-08-08 15:11:51,309 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=eddie OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1502226348464_0002 CONTAINERID=container_1502226348464_0002_01_000003 2017-08-08 15:11:51,351 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1502226348464_0002_01_000003 transitioned from EXITED_WITH_SUCCESS to DONE 2017-08-08 15:11:51,351 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=eddie OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1502226348464_0002 CONTAINERID=container_1502226348464_0002_01_000010 2017-08-08 15:11:51,351 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1502226348464_0002_01_000010 transitioned from EXITED_WITH_SUCCESS to DONE 2017-08-08 15:11:51,357 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=eddie OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1502226348464_0002 CONTAINERID=container_1502226348464_0002_01_000016 2017-08-08 15:11:51,357 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1502226348464_0002_01_000016 transitioned from EXITED_WITH_FAILURE to DONE {panel} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org