Re: [s6-rc] How to handle longrun failures

Laurent Bercot Thu, 02 Mar 2017 06:58:19 -0800

Using s6-rc, I am not sure how to handle longrun failures. Say I have adaemon which fails to start (e.g. missing library, cannot read itsconfig...). I don't want to start it again.


 It sounds like you don't want to supervise this daemon. In that case,
run it as a oneshot that backgrounds itself, and make sure the parent
exits nonzero if the child doesn't succeed.
 But if you do want to supervise it, keep reading:

For oneshot transitions the return code determines whether thetransition is successful or not. For longruns I see the only reason foran up transition to fail is a timeout on readiness notification.However I do not want to use a timeout in this case. Typically, in thefinish script of a longrun service, I would like to decide, based onthe return code or signal number, to put the service down.


 That makes sense, and it's possible to do it at the s6 level (just call
s6-svc -d . in the finish script). However, from the s6-rc point of
view, you have asked a supervised service to transition from down to up,
so it will not stop trying until the service is actually up or it
times out.

 My advice for now would be to:
1.  write your ./finish script with a s6-svc -d when you want to stop
restarting the daemon

2. set a reasonable timeout-up value in your s6-rc definition, so whenthedaemon fails and ./finish tells s6 to stop restarting it, thenotification

never arrives and s6-rc eventually times out and gives up. It's kind of
ugly, but it's the best you can do for now.

 I will think about implementing a way for s6 to tell s6-rc to fail a
longrun transition instantly, without waiting for a timeout. It's a good
idea, thanks for mentioning it.

--
 Laurent

Re: [s6-rc] How to handle longrun failures

Reply via email to