On 17/09/2015 07:43, Colin Booth wrote:
I thought moves (directory or otherwise) were atomic.
Moves you can perform with rename(), yes, they are. But when the longrun is kept up during the update, you need to copy a whole new set of files (the contents of the new service directory) into the old service directory, which you have to keep in order not to break its identification by the supervision tree. So it's not just a rename, it's modifying a whole set of files in a *live* service directory. And there's no doing that atomically. I disable the supervisor for the directory update, so if the service dies at the wrong time, at least it will not be restarted until its set of files is consistent again. But there's no such protection for the ./finish script, so there are serious aerobatics involved to minimize the window where ./finish is invoked while its service directory is wildly changing. I could theoretically add a control command to s6-supervise to make it delay the execution of ./finish. But I don't think it would be worth it: it adds significant risks (what if a process sends a "block" command, then dies or otherwise fails to send an "unblock" command?), and complexity, for an extreme corner case that will probably never happen. If a ./finish failure is critical, the user should simply tell s6-rc-update to restart the service, which is 100% safe because the service directory will then be updated offline instead of live. -- Laurent
