Re: s6-svwait not reaping zombies?

2021-07-22 Thread Laurent Bercot

So, I should raise this as a Github s6-overlay issue?


 Yes, please.

--
 Laurent



Re: s6-svwait not reaping zombies?

2021-07-22 Thread Daniel Griscom
Thanks: submitted here: 
https://github.com/just-containers/s6-overlay/issues/350


I want to confirm my understanding of the problem. In both cases where 
there are unreaped zombies, the zombies properly have PPID 1, but PID 1 
is the execline "foreground" command, which doesn't look for and/or reap 
zombies. Would adding zombie reaping to "foreground" (probably by 
periodically calling wait_reap()) solve this problem?


(I understand that this would be an ugly feature to add to "foreground"; 
I just want to make sure I understand the problem.)



Thanks,
Dan


Laurent Bercot wrote on 7/22/21 5:34 AM:

So, I should raise this as a Github s6-overlay issue?


 Yes, please.

--
 Laurent



--
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447  gris...@suitable.com  http://www.suitable.com/


Re: s6-svwait not reaping zombies?

2021-07-22 Thread Laurent Bercot

I want to confirm my understanding of the problem. In both cases where there are unreaped zombies, 
the zombies properly have PPID 1, but PID 1 is the execline "foreground" command, which 
doesn't look for and/or reap zombies. Would adding zombie reaping to "foreground" 
(probably by periodically calling wait_reap()) solve this problem?

(I understand that this would be an ugly feature to add to "foreground"; I just 
want to make sure I understand the problem.)


 There are several unrelated issues and I haven't identified all of 
them;

what I know is that they're all related to having to perform operations
after s6-svscan has exec'ed (which means that 1. the supervision tree
isn't operational anymore and 2. whatever is running as pid 1 may or
may not be reaping zombies).

 The issue that you are noticing is, as you correctly identified, 2:
there are processes that died after s6-svscan exited its loop, and
init-stage3, which is running as pid 1, is a sequence of programs that
do not reap zombies. If no reaping at all is performed before the
container is shut down, those zombies will remain.

 "foreground" is one of those programs, but you only see it in the
ps list because its child, s6-svwait, is hanging until it times out,
and that is due to 1. Making foreground wait() would be a small, ad-hoc
band-aid; the real solution is to make foreground's child stop
hanging, and call execline's wait program as pid 1 before the container
exits.

 Again, I did perform some analysis a few months ago and came to the
conclusion that writing a new version of s6-overlay would be less effort
than patching the current version to high heavens. But since the new
version won't be out for a while, some tweaks are definitely needed for
the current version - but I'd rather delegate all of them to John. :)

--
 Laurent