On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:
>> Yeah, I wondered why that was different than the pattern established
>> elsewhere when I was hacking on replication code.  There are actually
>> several places where we call PostmasterIsAlive() unconditionally in a
>> loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
> Note that just changing this implies a behavioural change in at least
> some of those: If the loop is busy with work, the WaitLatch might never
> be reached.  I think that might be ok in most of those, but it does
> require examination.

Rebased, but I don't actually like this patch any more.  Over in
another thread[1] I proposed that we should just make exit(1) the
default behaviour built into latch.c for those cases that don't want
to do something special (eg SyncRepWaitForLSN()), so we don't finish
up duplicating the ugly exit(1) code everywhere (or worse, forgetting
to).  Tom Lane seemed to think that was an idea worth pursuing.

I think what we need for PG12 is a patch that does that, and also
introduces a reused WaitEventSet object in several of these places.
Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
objects every time (assuming an implementation like epoll or
eventually kqueue, completion ports etc is available).

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2] or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.


Thomas Munro

Attachment: 0001-Replace-PostmasterIsAlive-calls-with-WL_POSTMASTER_D.patch
Description: Binary data

Reply via email to