[sorry for replying late, catching up] * Laurent Bercot <ska-supervis...@skarnet.org> [20160627 18:05]: > On 27/06/2016 14:02, Joan Picanyol i Puig wrote: > >However, couldn't they know whether their child did not cease to run > >because > >of a signal they sent? > > I'm not sure about runsv, but s6-supervise is a state machine, and the > service state only goes from UP to FINISH when the supervisor receives a > SIGCHLD. The state does not change at all after the supervisor sent a > signal: it sent a signal, yeah, so what - it's entirely up to the daemon > what to do with that signal.
I understand: supervisors only exec() processes and propagate signals, they have no saying in nor can expect what their effect is. > There's an exception for SIGSTOP because stopped daemons won't die > before you SIGCONT them, but that's it; even sending SIGKILL won't > make s6-supervise change states. Of course, if you send SIGKILL, > you're going to receive a SIGCHLD very soon, and *that* will trigger a > state change. Given that SIGKILL shares with SIGSTOP the fact that they can't be caught (and thus supervisors can assume a forthcoming SIGCHLD) signals (pun intended) that the exception should be extended? > >No, but neither can the admin enforce this policy automatically and > >portably using current supervisors. Other than the "dedicated user/login > >class/cgroup" scheme proposed by Jan (which can be considered best > >practice anyway), it'd be nice if they exposed this somehow (hand-waving > >SMOP ahead: duplicate the pid field in ./status and remove the working > >copy only when receiving a down signal). > > No need to duplicate the pid field: if s6-supervise dies before the service > goes down, the pid field in supervise/status is left unchanged, so it still > contains the correct pid. I suspect runsv works the same. Ah, ok, it didn't occur to me that pid 0 in supervise/status could be used to mean "never run or got SIGCHLD" > I guess a partial mitigation strategy could be "if supervise/status exists > and its pid field is nonzero when the supervisor starts, warn that an > instance of the daemon may still be running and print its pid". Do you > think it would be worth the effort? As well as the warning (which would make troubleshooting easier and might have probably avoided this thread), a robust automation enabling ui (in s6-svstat / s6-svok) would round this additional feature and make it yet more useful. keep up the good work -- pica