On 08/09/2015 15:04, Jan Bramkamp wrote:
Not if something kills the polling script e.g. stray kill -9 $WRONG_PID.
Yeah, yeah. It's a question of risk assessment. We supervise long-lived processes because in the course of their lives, they may receive a stray signal, but more likely, they may die from a bug or a temporary error. Stray signals are actually very rare. The likelihood of a short-lived process receiving a stray signal is very, very low. If it were any higher, Unix simply would not work, because you could never count on any process staying alive long enough to actually perform its job. Well, that's not the case: processes don't die on a whim - they die because they're buggy or they're lacking resources. We supervise processes when the cost of not supervising them is higher than the cost of supervising them. It makes sense for daemons. It would not make sense for a polling process. Say you're incredibly unlucky and a stray signal hits your poller. What happens then? Your daemon doesn't get killed, and it's not ready. Tough. The same situation can happen with any daemon you don't poll for readiness. If a daemon uses notification and gets stuck, well, it never notifies readiness, and that's it. It doesn't get killed for it. If you, as an admin, estimate that it's a risk you cannot take, i.e. the probability of your daemon getting stuck multiplied by the cost of the consequences is too high a number, then you should do something about it. And the great thing is that you already can. Set up a listener on the service's notification channel that kills the daemon when too much time elapses between the 'u' and the 'U' event. Done. Another possibility, if your daemon is critical, and you have a poller for it: set up a long-lived monitor for your daemon, and restart it whenever the monitor fails, without even using the s6 notification channel. When you have a service that's critical enough to make you want to protect against stray signals hitting short-lived processes, that's the kind of thing you want to do anyway. You're not going to rely on a ./check poll at the beginning of the run script. For everything else, a short-lived background process is more than enough. -- Laurent
