On Wed, Apr 11, 2018 at 12:03 PM, Andres Freund <and...@anarazel.de> wrote: > On 2018-04-11 11:57:20 +1200, Thomas Munro wrote: >> Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts() >> (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based >> approach where available as suggested by Andres[2] or fall back to >> polling a reusable WaitEventSet (timeout = 0), then there'd be no more >> calls to PostmasterIsAlive() outside latch.c. > > I'm still unconvinced by this. There's good reasons why code might be > busy-looping without checking the latch, and we shouldn't force code to > add more latch checks if unnecessary. Resetting the latch frequently can > actually increase the amount of context switches considerably.
What latch? There wouldn't be a latch in a WaitEventSet that is used only to detect postmaster death. I arrived at this idea via the realisation that the closest thing to prctl(PR_SET_PDEATHSIG) on BSD-family systems today is please-tell-my-kqueue-if-this-process-dies. It so happens that my kqueue patch already uses that instead of the pipe to detect postmaster death. The only problem is that you have to ask it, by calling it kevent(). In a busy loop like those two, where there is no other kind of waiting going on, you could do that by calling kevent() with timeout = 0 to check the queue. You could probably figure out a way to hide the prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code, with a fast path that doesn't make any system calls if the only event registered is postmaster death (you can just check the global variable set by your signal handler). But I guess you wouldn't like the extra function call so I guess you'd prefer to check the global variable directly in the busy loop, in builds that have prctl(PR_SET_PDEATHSIG). -- Thomas Munro http://www.enterprisedb.com