Hello, all. I'm using s6 as the init process manager in a Docker container, using s6-overlay Everything's working fine, but I send a SIGINT to the container, the processes being managed exit, but they become zombies and aren't reaped, forcing the system to timeout (twice, actually).

I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04 system. It looks like s6-svscan sends SIGINT or SIGTERM to the processes, and then uses s6-svwait to wait for the processes to exit, but the zombie processes are never reaped.

I found the following reference that suggests the problem might be a kernel problem: https://github.com/just-containers/s6-overlay/issues/135 , although I'm not seeing the high zombie CPU usage referenced. I also found https://wiki.gentoo.org/wiki/S6 , which suggested that sending a SIGCHLD to s6-svscan would cause it to re-scan for zombies that didn't work.

Here are the processes once everything is started (viewed by "ps axl" after running bash in a separate connection to the container):
root@4fa66da81d02:/# ps axl
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND 4     0     1     0 20   0    196     4 poll_s Ss+  pts/0      0:00 s6-svscan -t0 /var/run/s6/services 4     0    35     1 20   0    196     4 poll_s S+   pts/0      0:00 s6-supervise s6-fdholderd 4     0   228     1 20   0    196     4 poll_s S+   pts/0      0:00 s6-supervise thttpd 4     0   229     1 20   0    196     4 poll_s S+   pts/0      0:00 s6-supervise exrouter 4 65534   232   228 30  10 179052 165784 poll_s SNs ?          0:00 /opt/pdm/bin/thttpd -nip -nos -c **.html|**.sh| 4     0   233   229 30  10   6224  1568 poll_s SNs  ?          0:00 /opt/pdm/bin/exrouter-cpp
4     0   247     0 20   0   5996  3756 do_wai Ss   pts/1      0:00 bash
4     0   255   247 20   0   7568  3024 -      R+   pts/1      0:00 ps axl
And, once I issue a ^C to the container, but before any timeout:
root@4fa66da81d02:/# ps axl
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND 4     0     1     0 20   0    176     4 do_wai Ss+  pts/0      0:00 foreground  backtick -D  3000  -n  S6_SERVICES 4 65534   232     1 30  10      0     0 -      ZNs  ?          0:00 [thttpd] <defunct> 4     0   233     1 30  10      0     0 -      ZNs  ?          0:00 [exrouter-cpp] <defunct>
4     0   247     0 20   0   5996  3860 do_wai Ss   pts/1      0:00 bash
0     0   271     1 20   0    176     4 do_wai S+   pts/0      0:00 foreground  s6-svwait -D  -t  10000  /var/run/ 4     0   278   271 20   0    204     8 poll_s S+   pts/0      0:00 s6-svwait -D -t 10000 /var/run/s6/services/thtt 4     0   279   278 20   0    452     4 poll_s S+   pts/0      0:00 s6-ftrigrd
4     0   280   247 20   0   7568  2976 -      R+   pts/1      0:00 ps axl
And, after the system times out and sends SIGTERM to all the processes:
root@4fa66da81d02:/# ps axl
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND 4     0     1     0 20   0    176     4 do_wai Ss+  pts/0      0:00 foreground  backtick -D  3000  -n  S6_KILL_GRA 4 65534   232     1 30  10      0     0 -      ZNs  ?          0:00 [thttpd] <defunct> 4     0   233     1 30  10      0     0 -      ZNs  ?          0:00 [exrouter-cpp] <defunct> 4     0   279     1 20   0      0     0 -      Z+   pts/0      0:00 [s6-ftrigrd] <defunct> 0     0   285     1 20   0    168     4 poll_s S+   pts/0      0:00 s6-sleep -m -- 10000
4     0   292     0 20   0   5992  3760 do_wai Ss   pts/1      0:00 bash
4     0   300   292 20   0   7568  3080 -      R+   pts/1      0:00 ps axl

You can see:

- The managed processes are "thttpd" and "exrouter"
- I bumped the timeouts to 10000ms for the above tests
- When s6-svscan decides to exit, it sends signals to all the managed processes, and the s6-supervised processes exit but the two managed processes become zombies and aren't reaped - Timing out still doesn't kill thttpd or exrouter (although it does kill bash, so I had to reconnect to gather the third "ps axl"

It's easy to cut the timeout to, say, 100ms, but I'd much rather have a correct shutdown sequence, as that's why I switched to s6 in the first place.


Any ideas?

Thanks,
Dan

--
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447  gris...@suitable.com  http://www.suitable.com/

Reply via email to