Re: s6-svwait not reaping zombies?

2021-07-22 Thread Laurent Bercot

I want to confirm my understanding of the problem. In both cases where there are unreaped zombies, 
the zombies properly have PPID 1, but PID 1 is the execline "foreground" command, which 
doesn't look for and/or reap zombies. Would adding zombie reaping to "foreground" 
(probably by periodically calling wait_reap()) solve this problem?

(I understand that this would be an ugly feature to add to "foreground"; I just 
want to make sure I understand the problem.)


 There are several unrelated issues and I haven't identified all of 
them;

what I know is that they're all related to having to perform operations
after s6-svscan has exec'ed (which means that 1. the supervision tree
isn't operational anymore and 2. whatever is running as pid 1 may or
may not be reaping zombies).

 The issue that you are noticing is, as you correctly identified, 2:
there are processes that died after s6-svscan exited its loop, and
init-stage3, which is running as pid 1, is a sequence of programs that
do not reap zombies. If no reaping at all is performed before the
container is shut down, those zombies will remain.

 "foreground" is one of those programs, but you only see it in the
ps list because its child, s6-svwait, is hanging until it times out,
and that is due to 1. Making foreground wait() would be a small, ad-hoc
band-aid; the real solution is to make foreground's child stop
hanging, and call execline's wait program as pid 1 before the container
exits.

 Again, I did perform some analysis a few months ago and came to the
conclusion that writing a new version of s6-overlay would be less effort
than patching the current version to high heavens. But since the new
version won't be out for a while, some tweaks are definitely needed for
the current version - but I'd rather delegate all of them to John. :)

--
 Laurent



Re: s6-svwait not reaping zombies?

2021-07-22 Thread Daniel Griscom
Thanks: submitted here: 
https://github.com/just-containers/s6-overlay/issues/350


I want to confirm my understanding of the problem. In both cases where 
there are unreaped zombies, the zombies properly have PPID 1, but PID 1 
is the execline "foreground" command, which doesn't look for and/or reap 
zombies. Would adding zombie reaping to "foreground" (probably by 
periodically calling wait_reap()) solve this problem?


(I understand that this would be an ugly feature to add to "foreground"; 
I just want to make sure I understand the problem.)



Thanks,
Dan


Laurent Bercot wrote on 7/22/21 5:34 AM:

So, I should raise this as a Github s6-overlay issue?


 Yes, please.

--
 Laurent



--
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447  gris...@suitable.com  http://www.suitable.com/


Re: s6-svwait not reaping zombies?

2021-07-22 Thread Laurent Bercot

So, I should raise this as a Github s6-overlay issue?


 Yes, please.

--
 Laurent



Re: s6-svwait not reaping zombies?

2021-07-21 Thread Daniel Griscom

So, I should raise this as a Github s6-overlay issue?


Thanks,
Dan

Laurent Bercot wrote on 7/21/21 5:27 PM:


I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 
2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 
18.04 system. It looks like s6-svscan sends SIGINT or SIGTERM to the 
processes, and then uses s6-svwait to wait for the processes to exit, 
but the zombie processes are never reaped.


 Hi Daniel,

 I'm actually not the maintainer of s6-overlay: John is. I think the
correct place to describe your issue is GitHub where s6-overlay is 
hosted.


 I am aware that there is a race condition problem with zombies in the
shutdown sequence of s6-overlay. This is not the first time it occurs
(at some point broken kernels were also causing similar troubles, but
this is probably not what is happening here).

 For instance, I know that the line at
https://github.com/just-containers/s6-overlay/blob/master/builder/overlay-rootfs/etc/s6/init/init-stage3#L53 


is incorrect: s6-svwait cannot run correctly when the supervision tree
has been torn down, which is the case in init-stage3. This is why the
s6-svwait programs are waiting until they time out: even though the
services they're waiting for are down, they're never triggered because
the associated s6-supervise processes, which perform the triggers, are
already dead.

 Unfortunately, fixing this requires a significant rewrite of the
s6-overlay shutdown sequence. I have started working on this, but it has
been preempted by another project, and will likely not come out before
2022. I'm sorry; I would like to provide the correct shutdown sequence
you're looking for (and that is entirely possible to achieve with s6)
but as is, we have to make do with the current sequence.

 A tweak I would try is replacing the whole foreground block at lines
48-55 with the following: (without a foreground block)

backtick -D 3000 -n S6_SERVICES_GRACETIME { printcontenv 
S6_SERVICES_GRACETIME }

importas -u S6_SERVICES_GRACETIME S6_SERVICES_GRACETIME
wait -t ${S6_SERVICES_GRACETIME} { }

 This makes it so init-stage3 simply waits for all processes to die
before continuing, instead of waiting for a trigger that will never come.
It is not a long-term solution though, because having for instance a
shell on your container will make the "wait" command block until it
times out; but it may be helpful for your situation.

 Please open a GitHub issue to discuss this.

--
 Laurent



--
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447  gris...@suitable.com  http://www.suitable.com/


Re: s6-svwait not reaping zombies?

2021-07-21 Thread Laurent Bercot




I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 2.2.0.3, 
which I believe has the latest s6. All runs on an Ubuntu 18.04 system. It looks 
like s6-svscan sends SIGINT or SIGTERM to the processes, and then uses 
s6-svwait to wait for the processes to exit, but the zombie processes are never 
reaped.


 Hi Daniel,

 I'm actually not the maintainer of s6-overlay: John is. I think the
correct place to describe your issue is GitHub where s6-overlay is 
hosted.


 I am aware that there is a race condition problem with zombies in the
shutdown sequence of s6-overlay. This is not the first time it occurs
(at some point broken kernels were also causing similar troubles, but
this is probably not what is happening here).

 For instance, I know that the line at
https://github.com/just-containers/s6-overlay/blob/master/builder/overlay-rootfs/etc/s6/init/init-stage3#L53
is incorrect: s6-svwait cannot run correctly when the supervision tree
has been torn down, which is the case in init-stage3. This is why the
s6-svwait programs are waiting until they time out: even though the
services they're waiting for are down, they're never triggered because
the associated s6-supervise processes, which perform the triggers, are
already dead.

 Unfortunately, fixing this requires a significant rewrite of the
s6-overlay shutdown sequence. I have started working on this, but it has
been preempted by another project, and will likely not come out before
2022. I'm sorry; I would like to provide the correct shutdown sequence
you're looking for (and that is entirely possible to achieve with s6)
but as is, we have to make do with the current sequence.

 A tweak I would try is replacing the whole foreground block at lines
48-55 with the following: (without a foreground block)

backtick -D 3000 -n S6_SERVICES_GRACETIME { printcontenv 
S6_SERVICES_GRACETIME }

importas -u S6_SERVICES_GRACETIME S6_SERVICES_GRACETIME
wait -t ${S6_SERVICES_GRACETIME} { }

 This makes it so init-stage3 simply waits for all processes to die
before continuing, instead of waiting for a trigger that will never 
come.

It is not a long-term solution though, because having for instance a
shell on your container will make the "wait" command block until it
times out; but it may be helpful for your situation.

 Please open a GitHub issue to discuss this.

--
 Laurent