Re: s6: something like runit's ./check script

Jan Bramkamp Tue, 08 Sep 2015 06:05:08 -0700

On 08/09/15 14:50, Laurent Bercot wrote:

On 08/09/2015 14:10, Jan Bramkamp wrote:

How would the ./run script or more likely the daemon it exec()ed into
die from a failed child process?


  The child process could s6-svc -t if it fails to find readiness, for
instance. There should be an option in the polling tool to kill the
daemon if the polling does not succeed.
  I went too far in saying "the run script will die": there needs to
be support for that, indeed. But "the service is stuck" problem is
easy to fix.

Not if something kills the polling script e.g. stray kill -9 $WRONG_PID.Such things shouldn't happen but that's why I want a supervision treerooted in init. If anything happens to a subtree the supervisor for thatsubtree restarts the subtree and if something happens to the root of thesupervision tree (init) the kernel panics and a hardware watchdogtriggers within a few seconds. To let services fail and restart theinfrastructure has to notice errors. Maybe adding an optional timeoutbetween forking the ./run script and the readiness notification tos6-supervise would solve the problem without depending on other demons.Since such errors are expected to very rare a higher recovery time(whatever the the admin guessed as a worst case start up time) would bean appropriate trade-off if it avoids complexity. It would make sense tosignal this condition to the ./finish script and at least log it from where.

Re: s6: something like runit's ./check script

Reply via email to