Using s6-rc, I am not sure how to handle longrun failures. Say I have a
daemon which fails to start (e.g. missing library, cannot read its
config...). I don't want to start it again.
It sounds like you don't want to supervise this daemon. In that case,
run it as a oneshot that backgrounds itself, and make sure the parent
exits nonzero if the child doesn't succeed.
But if you do want to supervise it, keep reading:
For oneshot transitions the return code determines whether the
transition is successful or not. For longruns I see the only reason for
an up transition to fail is a timeout on readiness notification.
However I do not want to use a timeout in this case. Typically, in the
finish script of a longrun service, I would like to decide, based on
the return code or signal number, to put the service down.
That makes sense, and it's possible to do it at the s6 level (just call
s6-svc -d . in the finish script). However, from the s6-rc point of
view, you have asked a supervised service to transition from down to up,
so it will not stop trying until the service is actually up or it
My advice for now would be to:
1. write your ./finish script with a s6-svc -d when you want to stop
restarting the daemon
2. set a reasonable timeout-up value in your s6-rc definition, so when
daemon fails and ./finish tells s6 to stop restarting it, the
never arrives and s6-rc eventually times out and gives up. It's kind of
ugly, but it's the best you can do for now.
I will think about implementing a way for s6 to tell s6-rc to fail a
longrun transition instantly, without waiting for a timeout. It's a good
idea, thanks for mentioning it.