[ 
https://issues.apache.org/jira/browse/YARN-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539290#comment-16539290
 ] 

Jim Brennan commented on YARN-8515:
-----------------------------------

I have been able to repro this reliably on a test cluster.
Repro steps are:
# Start sleep job with a lot of mappers sleeping for 50 seconds
# on one worker node, kill NM after a set of containers starts
# restart the NM
# On the gw, kill the application (before the current containers finish)

This will leave the containers on the node where the nodemanager was restarted 
in the exited state.

container-executor is not cleaning up the docker containers. Here is an strace 
of one of the container-executors when the application is killed:
{noformat}
-bash-4.2$ sudo strace -s 4096 -f -p 7176
strace: Process 7176 attached
read(3, "143\n", 4096) = 4
close(3) = 0
wait4(7566, [\{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 7566
--- SIGCHLD \{si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7566, si_uid=0, 
si_status=0, si_utime=1, si_stime=0} ---
munmap(0x7f233bfa4000, 4096) = 0
write(2, "Docker container exit code was not zero: 143\n", 45) = -1 EPIPE 
(Broken pipe)
--- SIGPIPE \{si_signo=SIGPIPE, si_code=SI_USER, si_pid=7176, si_uid=0} ---
+++ killed by SIGPIPE +++
{noformat}

The problem is that when container-executor is started by the NM using the 
priviledged operation executor, it attaches stream readers to stdout and stderr.
When we restart the NM, these threads are killed. Then when the application is 
killed, it kills the running containers and container-executor returns from 
waiting for the docker container. When it tries to write an error message to 
stderr, it generates a SIGPIPE signal, because the other end of the pipe has 
been killed. Since we are not handling that signal, container-executor crashes 
and we never remove the docker container.

I have verified that if I change container-executor to ignore SIGPIPE, the 
problem does not occur.

> container-executor can crash with SIGPIPE after nodemanager restart
> -------------------------------------------------------------------
>
>                 Key: YARN-8515
>                 URL: https://issues.apache.org/jira/browse/YARN-8515
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
>
> When running with docker on large clusters, we have noticed that sometimes 
> docker containers are not removed - they remain in the exited state, and the 
> corresponding container-executor is no longer running.  Upon investigation, 
> we noticed that this always seemed to happen after a nodemanager restart.   
> The sequence leading to the stranded docker containers is:
>  # Nodemanager restarts
>  # Containers are recovered and then run for a while
>  # Containers are killed for some (legitimate) reason
>  # Container-executor exits without removing the docker container.
> After reproducing this on a test cluster, we found that the 
> container-executor was exiting due to a SIGPIPE.
> What is happening is that the shell command executor that is used to start 
> container-executor has threads reading from c-e's stdout and stderr.  When 
> the NM is restarted, these threads are killed.  Then when the 
> container-executor continues executing after the container exits with error, 
> it tries to write to stderr (ERRORFILE) and gets a SIGPIPE.  Since SIGPIPE is 
> not handled, this crashes the container-executor before it can actually 
> remove the docker container.
> We ran into this in branch 2.8.  The way docker containers are removed has 
> been completely redesigned in trunk, so I don't think it will lead to this 
> exact failure, but after an NM restart, potentially any write to stderr or 
> stdout in the container-executor could cause it to crash.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to