I want to confirm my understanding of the problem. In both cases where there are unreaped zombies, 
the zombies properly have PPID 1, but PID 1 is the execline "foreground" command, which 
doesn't look for and/or reap zombies. Would adding zombie reaping to "foreground" 
(probably by periodically calling wait_reap()) solve this problem?

(I understand that this would be an ugly feature to add to "foreground"; I just 
want to make sure I understand the problem.)

There are several unrelated issues and I haven't identified all of them;
what I know is that they're all related to having to perform operations
after s6-svscan has exec'ed (which means that 1. the supervision tree
isn't operational anymore and 2. whatever is running as pid 1 may or
may not be reaping zombies).

 The issue that you are noticing is, as you correctly identified, 2:
there are processes that died after s6-svscan exited its loop, and
init-stage3, which is running as pid 1, is a sequence of programs that
do not reap zombies. If no reaping at all is performed before the
container is shut down, those zombies will remain.

 "foreground" is one of those programs, but you only see it in the
ps list because its child, s6-svwait, is hanging until it times out,
and that is due to 1. Making foreground wait() would be a small, ad-hoc
band-aid; the real solution is to make foreground's child stop
hanging, and call execline's wait program as pid 1 before the container
exits.

 Again, I did perform some analysis a few months ago and came to the
conclusion that writing a new version of s6-overlay would be less effort
than patching the current version to high heavens. But since the new
version won't be out for a while, some tweaks are definitely needed for
the current version - but I'd rather delegate all of them to John. :)

--
 Laurent

Reply via email to