On Sun, 16 Jul 2017 08:41:03 +0000 "Laurent Bercot" <ska-skaw...@skarnet.org> wrote:
> As I told Jesse on IRC, the patch isn't going in. I'm not including > OS-specific code into s6, even with a compile-time option. The main > reason for it is that it changes the API: the choice to spawn the > service in a new namespace or not should be made at run time, so > it would introduce a new file in the service directory that would only > be valid under Linux, and the file would need to be supported by > s6-rc and friends even on other systems, etc. This is exactly the kind > of complexity created by OS divergences that plagues the Unix world > and that I very much want to avoid. This change itself looks quite > simple, but it would be a precedent and the slope is extremely > slippery. > > > >Though as Jesse explained, this requires some sort of exit/signal > >proxing, which isn't the case here. Here the direct child of > >s6-supervise remains the daemon itself - in its own pid ns - which is > >much better. > It would unarguably be more elegant, yes, but since there's a way to > do without it, it's only about elegance, not feasability - and I > really think the cost for elegance is too high. > > execline's 'trap' binary can adequately perform the needed proxying > at low resource cost. There's a reliability issue as well, one thing trap won't be able to do is relay KILL and other uncatchable signals. The proxy and the service aren't completely sharing the same fate as each other, which may break users expectations when they run, e.g. s6-svc -k. This fate sharing mismatch is similar to how s6-sudod may keep running despite s6-sudoc's death. To reliably deliver a KILL signal to the service without re-introducing pid races, the proxy would need a non-fatal way to know to send the signal, requiring a new API. Improving on execline's trap program to handle this would probably transform it into a version of daemontools supervise that exits rather than respawns. If there were no interposing process, an added benefit is that killing the pid 1 in a namespace also kills all the other processes in that namespace. This is a secondary thought, but we get it for free. Using pid namespaces isn't necessary to reliably kill a set of processes on Linux, another technique is to use a cgroup freezer, which essentially puts a write lock on a set of processes, so that an unrelated process can send signals without the TOCTOU races. > If more various namespace feature requests come in, at some point I > will look for a way to integrate some namespace functions into > skalibs, with proper compile-time guards and stubs, and then > reconsider; but as long as there are ways to achieve the desired > outcome with external tools, it's not a priority. Unsharing everything, except for the pid namespace, should take effect immediately on the calling process. CLONE_NEWNS is special since the pid 1 effect gets applied to the process' first child, rather than the process itself. This means the unshare tool from util-linux should be able to do everything else in the usual exec chainloading style. Alternative to patching s6-supervise, what do you think about adding an option to spawn a different supervisor from s6-svscan? An example API could be, s6-svscan will spawn "./supervisor" for each service dir it tracks, if it doesn't exist, it just spawns s6-supervise as usual. Note that having ./supervisor consist of "unshare -p s6-supervise" will prevent s6-supervise from re-spawning its supervised child. fork() will succeed the first time, but error with ENOMEM on subsequent calls unless unshare(CLONE_NEWPID) is called again. In fact, the patch could probably avoid refactoring the child into a continuation by doing unshare() ; fork() ; rather than clone().