On 27/06/2016 14:02, Joan Picanyol i Puig wrote:
However, couldn't they know whether their child did not cease to run because of a signal they sent?
I'm not sure about runsv, but s6-supervise is a state machine, and the service state only goes from UP to FINISH when the supervisor receives a SIGCHLD. The state does not change at all after the supervisor sent a signal: it sent a signal, yeah, so what - it's entirely up to the daemon what to do with that signal. There's an exception for SIGSTOP because stopped daemons won't die before you SIGCONT them, but that's it; even sending SIGKILL won't make s6-supervise change states. Of course, if you send SIGKILL, you're going to receive a SIGCHLD very soon, and *that* will trigger a state change.
No, but neither can the admin enforce this policy automatically and portably using current supervisors. Other than the "dedicated user/login class/cgroup" scheme proposed by Jan (which can be considered best practice anyway), it'd be nice if they exposed this somehow (hand-waving SMOP ahead: duplicate the pid field in ./status and remove the working copy only when receiving a down signal).
No need to duplicate the pid field: if s6-supervise dies before the service goes down, the pid field in supervise/status is left unchanged, so it still contains the correct pid. I suspect runsv works the same. I guess a partial mitigation strategy could be "if supervise/status exists and its pid field is nonzero when the supervisor starts, warn that an instance of the daemon may still be running and print its pid". Do you think it would be worth the effort? -- Laurent