[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343000#comment-14343000 ]
Beckham007 commented on YARN-3080: ---------------------------------- [~vvasudev] the kill -9 couldn't work in this situation. {quote} [gaia@c112 ~]$ ps -ef|grep 7188 gaia 7188 16807 0 14:46 ? 00:00:00 bash /data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002/docker_container_executor.sh gaia 7190 7188 0 14:46 ? 00:00:00 /usr/bin/docker run --rm --name container_1424999012322_0960_01_000002 -e GAIA_HOST_IP=10.149.27.112 -e GAIA_API_SERVER=http://shpc-test.api.oa.com/api -e GAIA_CLUSTER_ID=shpc-test -e GAIA_QUEUE=root.gaia -e GAIA_APP_NAME=dev_gaia -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1424999012322_0960_01_000002 --memory=256M --cpu-shares=1020 -v /data/gaia/logs/container-logs/application_1424999012322_0960/container_1424999012322_0960_01_000002:/data/gaia/logs/container-logs/application_1424999012322_0960/container_1424999012322_0960_01_000002 -v /data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002:/data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002 -P docker.oa.com:8080/library/dev_gaia_repo:v2 bash /data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002/launch_container.sh gaia 26414 32596 0 18:10 pts/12 00:00:00 grep 7188 [gaia@c112 ~]$ kill -9 7188 [gaia@c112 ~]$ ps -ef|grep 7188 gaia 26709 32596 0 18:10 pts/12 00:00:00 grep 7188 {quote} but the parent pid has changed to 1. {quote} [gaia@c112 ~]$ ps -ef|grep 7190 gaia 7190 1 0 14:46 ? 00:00:00 /usr/bin/docker run --rm --name container_1424999012322_0960_01_000002 -e GAIA_HOST_IP=10.149.27.112 -e GAIA_API_SERVER=http://shpc-test.api.oa.com/api -e GAIA_CLUSTER_ID=shpc-test -e GAIA_QUEUE=root.gaia -e GAIA_APP_NAME=dev_gaia -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1424999012322_0960_01_000002 --memory=256M --cpu-shares=1020 -v /data/gaia/logs/container-logs/application_1424999012322_0960/container_1424999012322_0960_01_000002:/data/gaia/logs/container-logs/application_1424999012322_0960/container_1424999012322_0960_01_000002 -v /data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002:/data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002 -P docker.oa.com:8080/library/dev_gaia_repo:v2 bash /data/gaia/data/yarn/local/usercache/gaia/appcache/application_1424999012322_0960/container_1424999012322_0960_01_000002/launch_container.sh gaia 28687 32596 0 18:11 pts/12 00:00:00 grep 7190 {quote} and the docker container still running {quote} [gaia@c112 ~]$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 235a2dc20c56 docker.oa.com:8080/library/dev_gaia_repo:v2 "/etc/rc.local bash 3 hours ago Up 3 hours 0.0.0.0:49861->36000/tcp container_1424999012322_0960_01_000002 {quote} > The DockerContainerExecutor could not write the right pid to container pidFile > ------------------------------------------------------------------------------ > > Key: YARN-3080 > URL: https://issues.apache.org/jira/browse/YARN-3080 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Beckham007 > Assignee: Abin Shahab > Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, > YARN-3080.patch > > > The docker_container_executor_session.sh is like this: > {quote} > #!/usr/bin/env bash > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1421723685222_0008_01_000002` > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp > /bin/mv -f > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid > /usr/bin/docker run --rm --name container_1421723685222_0008_01_000002 -e > GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e > GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e > GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e > GAIA_CONTAINER_ID=container_1421723685222_0008_01_000002 --memory=32M > --cpu-shares=1024 -v > /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002 > -v > /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002 > -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash > "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002/launch_container.sh" > {quote} > The DockerContainerExecutor use docker inspect before docker run, so the > docker inspect couldn't get the right pid for the docker, signalContainer() > and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)