Re: s6-svwait not reaping zombies?
I want to confirm my understanding of the problem. In both cases where there are unreaped zombies, the zombies properly have PPID 1, but PID 1 is the execline "foreground" command, which doesn't look for and/or reap zombies. Would adding zombie reaping to "foreground" (probably by periodically calling wait_reap()) solve this problem? (I understand that this would be an ugly feature to add to "foreground"; I just want to make sure I understand the problem.) There are several unrelated issues and I haven't identified all of them; what I know is that they're all related to having to perform operations after s6-svscan has exec'ed (which means that 1. the supervision tree isn't operational anymore and 2. whatever is running as pid 1 may or may not be reaping zombies). The issue that you are noticing is, as you correctly identified, 2: there are processes that died after s6-svscan exited its loop, and init-stage3, which is running as pid 1, is a sequence of programs that do not reap zombies. If no reaping at all is performed before the container is shut down, those zombies will remain. "foreground" is one of those programs, but you only see it in the ps list because its child, s6-svwait, is hanging until it times out, and that is due to 1. Making foreground wait() would be a small, ad-hoc band-aid; the real solution is to make foreground's child stop hanging, and call execline's wait program as pid 1 before the container exits. Again, I did perform some analysis a few months ago and came to the conclusion that writing a new version of s6-overlay would be less effort than patching the current version to high heavens. But since the new version won't be out for a while, some tweaks are definitely needed for the current version - but I'd rather delegate all of them to John. :) -- Laurent
Re: s6-svwait not reaping zombies?
Thanks: submitted here: https://github.com/just-containers/s6-overlay/issues/350 I want to confirm my understanding of the problem. In both cases where there are unreaped zombies, the zombies properly have PPID 1, but PID 1 is the execline "foreground" command, which doesn't look for and/or reap zombies. Would adding zombie reaping to "foreground" (probably by periodically calling wait_reap()) solve this problem? (I understand that this would be an ugly feature to add to "foreground"; I just want to make sure I understand the problem.) Thanks, Dan Laurent Bercot wrote on 7/22/21 5:34 AM: So, I should raise this as a Github s6-overlay issue? Yes, please. -- Laurent -- Daniel T. Griscom 152 Cochrane Street, Melrose, MA 02176-1433 (781) 662-9447 gris...@suitable.com http://www.suitable.com/
Re: s6-svwait not reaping zombies?
So, I should raise this as a Github s6-overlay issue? Yes, please. -- Laurent
Re: s6-svwait not reaping zombies?
So, I should raise this as a Github s6-overlay issue? Thanks, Dan Laurent Bercot wrote on 7/21/21 5:27 PM: I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04 system. It looks like s6-svscan sends SIGINT or SIGTERM to the processes, and then uses s6-svwait to wait for the processes to exit, but the zombie processes are never reaped. Hi Daniel, I'm actually not the maintainer of s6-overlay: John is. I think the correct place to describe your issue is GitHub where s6-overlay is hosted. I am aware that there is a race condition problem with zombies in the shutdown sequence of s6-overlay. This is not the first time it occurs (at some point broken kernels were also causing similar troubles, but this is probably not what is happening here). For instance, I know that the line at https://github.com/just-containers/s6-overlay/blob/master/builder/overlay-rootfs/etc/s6/init/init-stage3#L53 is incorrect: s6-svwait cannot run correctly when the supervision tree has been torn down, which is the case in init-stage3. This is why the s6-svwait programs are waiting until they time out: even though the services they're waiting for are down, they're never triggered because the associated s6-supervise processes, which perform the triggers, are already dead. Unfortunately, fixing this requires a significant rewrite of the s6-overlay shutdown sequence. I have started working on this, but it has been preempted by another project, and will likely not come out before 2022. I'm sorry; I would like to provide the correct shutdown sequence you're looking for (and that is entirely possible to achieve with s6) but as is, we have to make do with the current sequence. A tweak I would try is replacing the whole foreground block at lines 48-55 with the following: (without a foreground block) backtick -D 3000 -n S6_SERVICES_GRACETIME { printcontenv S6_SERVICES_GRACETIME } importas -u S6_SERVICES_GRACETIME S6_SERVICES_GRACETIME wait -t ${S6_SERVICES_GRACETIME} { } This makes it so init-stage3 simply waits for all processes to die before continuing, instead of waiting for a trigger that will never come. It is not a long-term solution though, because having for instance a shell on your container will make the "wait" command block until it times out; but it may be helpful for your situation. Please open a GitHub issue to discuss this. -- Laurent -- Daniel T. Griscom 152 Cochrane Street, Melrose, MA 02176-1433 (781) 662-9447 gris...@suitable.com http://www.suitable.com/
Re: s6-svwait not reaping zombies?
I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04 system. It looks like s6-svscan sends SIGINT or SIGTERM to the processes, and then uses s6-svwait to wait for the processes to exit, but the zombie processes are never reaped. Hi Daniel, I'm actually not the maintainer of s6-overlay: John is. I think the correct place to describe your issue is GitHub where s6-overlay is hosted. I am aware that there is a race condition problem with zombies in the shutdown sequence of s6-overlay. This is not the first time it occurs (at some point broken kernels were also causing similar troubles, but this is probably not what is happening here). For instance, I know that the line at https://github.com/just-containers/s6-overlay/blob/master/builder/overlay-rootfs/etc/s6/init/init-stage3#L53 is incorrect: s6-svwait cannot run correctly when the supervision tree has been torn down, which is the case in init-stage3. This is why the s6-svwait programs are waiting until they time out: even though the services they're waiting for are down, they're never triggered because the associated s6-supervise processes, which perform the triggers, are already dead. Unfortunately, fixing this requires a significant rewrite of the s6-overlay shutdown sequence. I have started working on this, but it has been preempted by another project, and will likely not come out before 2022. I'm sorry; I would like to provide the correct shutdown sequence you're looking for (and that is entirely possible to achieve with s6) but as is, we have to make do with the current sequence. A tweak I would try is replacing the whole foreground block at lines 48-55 with the following: (without a foreground block) backtick -D 3000 -n S6_SERVICES_GRACETIME { printcontenv S6_SERVICES_GRACETIME } importas -u S6_SERVICES_GRACETIME S6_SERVICES_GRACETIME wait -t ${S6_SERVICES_GRACETIME} { } This makes it so init-stage3 simply waits for all processes to die before continuing, instead of waiting for a trigger that will never come. It is not a long-term solution though, because having for instance a shell on your container will make the "wait" command block until it times out; but it may be helpful for your situation. Please open a GitHub issue to discuss this. -- Laurent