Hello Laurent, Thanks for your answers. Other opinions/experience welcome.
> That is a fair point. Normally, you should adjust the s6-rc > timeouts (both the global one and the service-specific one) to > make sure s6-rc does *not* time out before the service is ready - > but if there's an unexpected significant delay, the situation can > happen. Just to be clear I am talking about a service going into an infinite loop or deadlock. Obviously a bad service but I want to protect my system against it. > What I can do is add an option to s6-rc to make it explicitly send > a s6-svc -d to a service that times out before reaching readiness: > ensure that a service is either ready in time, or definitely down. > Would that help? Yes that would help. I suppose you also mean to wait for the service to go down before returning ? > The annoying thing is it can't be symmetrical: when a down > transition times out, there's no way I'm going to start the service > again. :) But generally, a down transition timing out signifies a > badly written finish script, or badly calibrated timeouts, and > it can be easily solved by running s6-rc -d change again. I agree. I would add that if timeout-down > timeout-kill + timeout-finish + some margin, the down transition should generally never time out. > What I can do is add a bit of signal handling to s6-rc, so that if > it gets interrupted, say with a SIGINT or SIGTERM, it exits ASAP, > while still ensuring consistency of the service states. I was thinking exactly the same :). I even think this could be tailored to system shutdown (I do not see another use case). E.g. for ongoing longrun up transitions, s6-rc could act as if the transition timed out and send "s6-svc -d". For ongoing longrun down transitions I am not sure whether it should wait for it to complete or not. > Unfortunately, for oneshots it would mean waiting for the current > transitions to finish before exiting - s6-rc has no way to interrupt > a running oneshot, and adding one (making s6rc-oneshot-runner kill > all its children) would not help, because until the oneshot script > exits, it is not visible from the outside whether it has accomplished > its transition or not - so the state would still be undetermined. I tend to think this would not be too much of a problem as I picture oneshots as having timeout-up and timeout-down of a few seconds, as opposed to longruns having timeouts of one or two minutes. But this assumption may be totally wrong. > Also, state consistency cannot be 100% ensured, because s6-rc could > still receive a SIGKILL - but if you kill -9 s6-rc, you deserve > trouble. I won't kill -9 s6-rc, I promise. Kr, Lionel