s6-svwait not reaping zombies?

Daniel Griscom Wed, 21 Jul 2021 12:19:41 -0700

Hello, all. I'm using s6 as the init process manager in a Dockercontainer, using s6-overlay Everything's working fine, but I send aSIGINT to the container, the processes being managed exit, but theybecome zombies and aren't reaped, forcing the system to timeout (twice,actually).

I'm using ubuntu:20.04 as a container using s6-overlay amd64 version2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04system. It looks like s6-svscan sends SIGINT or SIGTERM to theprocesses, and then uses s6-svwait to wait for the processes to exit,but the zombie processes are never reaped.

I found the following reference that suggests the problem might be akernel problem: https://github.com/just-containers/s6-overlay/issues/135, although I'm not seeing the high zombie CPU usage referenced. I alsofound https://wiki.gentoo.org/wiki/S6 , which suggested that sending aSIGCHLD to s6-svscan would cause it to re-scan for zombies that didn't work.

Here are the processes once everything is started (viewed by "ps axl"after running bash in a separate connection to the container):

root@4fa66da81d02:/# ps axl
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIMECOMMAND4 0 1 0 20 0 196 4 poll_s Ss+ pts/0 0:00s6-svscan -t0 /var/run/s6/services4 0 35 1 20 0 196 4 poll_s S+ pts/0 0:00s6-supervise s6-fdholderd4 0 228 1 20 0 196 4 poll_s S+ pts/0 0:00s6-supervise thttpd4 0 229 1 20 0 196 4 poll_s S+ pts/0 0:00s6-supervise exrouter4 65534 232 228 30 10 179052 165784 poll_s SNs ? 0:00/opt/pdm/bin/thttpd -nip -nos -c **.html|**.sh|4 0 233 229 30 10 6224 1568 poll_s SNs ? 0:00/opt/pdm/bin/exrouter-cpp
4     0   247     0 20   0   5996  3756 do_wai Ss   pts/1      0:00 bash
4     0   255   247 20   0   7568  3024 -      R+   pts/1      0:00 ps axl

And, once I issue a ^C to the container, but before any timeout:

root@4fa66da81d02:/# ps axl
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIMECOMMAND4 0 1 0 20 0 176 4 do_wai Ss+ pts/0 0:00foreground backtick -D 3000 -n S6_SERVICES4 65534 232 1 30 10 0 0 - ZNs ? 0:00[thttpd] <defunct>4 0 233 1 30 10 0 0 - ZNs ? 0:00[exrouter-cpp] <defunct>
4     0   247     0 20   0   5996  3860 do_wai Ss   pts/1      0:00 bash
0 0 271 1 20 0 176 4 do_wai S+ pts/0 0:00foreground s6-svwait -D -t 10000 /var/run/4 0 278 271 20 0 204 8 poll_s S+ pts/0 0:00s6-svwait -D -t 10000 /var/run/s6/services/thtt4 0 279 278 20 0 452 4 poll_s S+ pts/0 0:00s6-ftrigrd
4     0   280   247 20   0   7568  2976 -      R+   pts/1      0:00 ps axl

And, after the system times out and sends SIGTERM to all the processes:

root@4fa66da81d02:/# ps axl
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIMECOMMAND4 0 1 0 20 0 176 4 do_wai Ss+ pts/0 0:00foreground backtick -D 3000 -n S6_KILL_GRA4 65534 232 1 30 10 0 0 - ZNs ? 0:00[thttpd] <defunct>4 0 233 1 30 10 0 0 - ZNs ? 0:00[exrouter-cpp] <defunct>4 0 279 1 20 0 0 0 - Z+ pts/0 0:00[s6-ftrigrd] <defunct>0 0 285 1 20 0 168 4 poll_s S+ pts/0 0:00s6-sleep -m -- 10000
4     0   292     0 20   0   5992  3760 do_wai Ss   pts/1      0:00 bash
4     0   300   292 20   0   7568  3080 -      R+   pts/1      0:00 ps axl


You can see:

- The managed processes are "thttpd" and "exrouter"
- I bumped the timeouts to 10000ms for the above tests

- When s6-svscan decides to exit, it sends signals to all the managedprocesses, and the s6-supervised processes exit but the two managedprocesses become zombies and aren't reaped- Timing out still doesn't kill thttpd or exrouter (although it doeskill bash, so I had to reconnect to gather the third "ps axl"

It's easy to cut the timeout to, say, 100ms, but I'd much rather have acorrect shutdown sequence, as that's why I switched to s6 in the firstplace.



Any ideas?

Thanks,
Dan

--
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447  gris...@suitable.com  http://www.suitable.com/

s6-svwait not reaping zombies?

Reply via email to