On Thu, Aug 21, 2025 at 5:45 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > One other thought here: do we *really* want such a critical-and-hard- > to-test aspect of our behavior to be handled completely differently > on different platforms? I'd lean to ignoring the Linux/FreeBSD > facilities, because otherwise we're basically doubling our testing > problems in exchange for not much.
Yeah. That attraction is that it's extremely simple and reliable: set-and-forget, adding one line that sends you into well tested immediate shutdown code. Combined with the fact that most of our user base has it, that seemed attractive. The reliability aspects I was thinking of are: (1) the kernel's knowledge of the process tree is infallible by definition, (2) it's handled asynchronously on postmaster exit, not after a POLLHUP, EVFILT_PROCESS, or process HANDLE event that must be consumed synchronously by at least one child. For (2), in practice I think it's close to 100% certain that one backend will currently or very soon be in WaitEventSetWait() and thus drive the cleanup operation, and I think it's probably good enough. For example, even if your backends are all busy, there's basically always a bunch of "launchers" and other auxiliary processes ready and waiting to deal with it. But it's possible to dream up extreme theoretical scenarios where that bet fails: imagine if every single backend except for one is current waiting for a lock in sem_wait() (let's say it's the same lock for simplicity). I previously said in some throwaway comment that they can't all be blocked in sem_wait() or you already have a deadlock (a programming bug that isn't this system's fault), but if the postmaster AND the backend that holds the lock are killed by the OOM killer, you lose. Those backends would need to be cleaned up manually by an administrator in all released versions of PostgreSQL, and it's be not better with the v1 patch on Windows and macOS. They'd all eat SIGQUIT on a Linux or FreeBSD system with the v1 patch, so paper at least it's more hole-proof. I agree that it would be nice to have just one system though, and of course to make it completely reliable everywhere without complicated theories. One argument I thought of against PROC_PDEATHSIG_CTL is that its simplicity also takes away some possibilities. Yesterday I wrote "taking over the role of the departed Postmaster", and realised it's not the whole enchilada: do we also want the "issuing SIGKILL to recalcitrant children" bit? I don't want this system to be complicated, rather the opposite, but I wonder if there is a nice way to make it run *literally* the same code as the postmaster. We'd need bulletproof data structure sharing, or preferably, no sharing of modifiable data at all. Some ideas I'm looking into: better use of process groups, or maybe doing the book keeping in memory that is not even mapped into children until they need it. Or something. Researching...