Re: Propagating parent PID to ./run

Laurent Bercot Tue, 04 Jan 2022 10:33:36 -0800

If getppid() returns 1, it means the service has already been orphaned.


I don't think that's guaranteed by Posix.

Apparently another data point: 
https://gist.github.com/gsauthof/8c8406748e536887c45ec14b2e476cbc


 I thought it was, but apparently you're right:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/_exit.html

"The parent process ID of all of the existing child processes and zombieprocesses of the calling process shall be set to the process ID of animplementation-defined system process. That is, these processes shall beinherited by a special system process."


 That "special system process" is pid 1 on every Unix on Earth and in
Heaven, but of course systemd, being from Hell, had to be the extra
special snowflake (that for some reason doesn't melt in Hell) and do
things differently, for no reason at all.

 Subreapers without pid namespaces are not only useless, they are
*actively harmful*. 🤬

 Well, there goes this idea. But I'm *not* adding code to s6 to deal
with this systemd idiosyncrasy ('syncra' being optional); I don't
want to encourage parent detection hacks anyway.

What are your thoughts for this specific scenario? My understanding is that the 
supervisor would be relaunched, and another instance of the service would be 
started. I'd like to avoid/deal with the situation of the evil-twin service 
instance.


 Any reasonably written service will lock a resource at start, before
becoming ready; if the old instance is still around, it will try and
fail to lock the resource. Either it dies and will be restarted one
second later, until an admin finds and kills the old instance; or it
blocks on the lock, using no cpu, until the old instance is killed.
Since the goal of s6 is to maximize uptime even in a degraded state,
s6 takes no special action for that - but you could have a ./finish
script that sends a special alert when the service fails several times
in a few seconds. (I think by default it's better to have that degraded
state than to have service downtime that is not explicitly prompted by
an admin.)

 If the service doesn't lock any resource, then you have lock-fd,
which was designed to handle this - aaaand the documentation is
inaccurate, I'll fix it. The behaviour is that the new instance *blocks*
until the old instance is dead; s6-supervise writes a warning message
to its own stderr so the situation is detected. (Retrying after 60
seconds only happens in a few unlikely error situations.)

--
 Laurent

Re: Propagating parent PID to ./run

Reply via email to