Would it make sense for s6-supervise to inform s6-svscan of its child process ID?
No, it is a voluntary design decision that s6-svscan does not look deeper into the service tree and only watches s6-supervise processes. The goal is to keep s6-svscan as simple as can be, and it is a high- value goal since s6-svscan is a suitable candidate for pid 1. Any transmission of information from s6-supervise to s6-svscan would mean that s6-svscan now has to listen to N channels; any action to take depending on the transmitted data means that s6-svscan now has to implement some policy that is normally only the domain of s6-supervise. This adds significant complexity, with a non-negligible amount of failure cases, just for the goal of aiding recovery in case an instance s6-supervise dies, which, again, is something that has never happened yet. If you have dying s6-supervise processes, this is the thing you need to fix in priority. The current s6 architecture will work in degraded mode, which is obviously not ideal but it will still work; as an admin, if something has killed one of your s6-supervise processes, you likely have bigger problems to deal with than the new s6-supervise not being able to start its service until the old one has died. If your old service is still alive, then *you are still serving*, and s6 is doing its job, despite something being very wrong on your machine. s6-supervise death is a problem of perception, and of anxiety. I know. I feel that anxiety too. I wasn't sure how it was going to pan out when I released s6 that way. But after 14 years of use, and 11 years of daemontools use beforehand without a single supervisor death either, I can confidently say that it's going to be all right. If I wanted to 100% prevent this from ever happening even in our worst nightmares, rather than adding transmission channels between s6-supervise and s6-svscan, I would simply write a single supervisor like perpd, that watches N services at a time. That is how a lot of init systems work, among which dinit, nitro, and of course systemd. But cramming both s6-svscan and s6-supervise functionality into a single process ends up in more code complexity overall than the current s6 design, and I feel more confident (and less anxious) when minimizing complexity. The current implementation is pretty optimal when it comes to functionality/complexity ratio. Additionally, having s6-svscan and s6-supervise so loosely coupled means that you can run an instance of s6-supervise in the wild, without necessary being linked to an s6-svscan supervision tree. This is an uncommon pattern, but it has come up once or twice for me (read: more often than a supervisor's untimely death), and there are probably users who rely on this being possible. I'd rather not break an existing workflow unless it is proven necessary. -- Laurent
