On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <and...@anarazel.de> wrote: > On 2016-09-20 11:07:03 +1200, Thomas Munro wrote: >> Yeah, I wondered why that was different than the pattern established >> elsewhere when I was hacking on replication code. There are actually >> several places where we call PostmasterIsAlive() unconditionally in a >> loop that waits for WL_POSTMASTER_DEATH but ignores the return code: > > Note that just changing this implies a behavioural change in at least > some of those: If the loop is busy with work, the WaitLatch might never > be reached. I think that might be ok in most of those, but it does > require examination.
Rebased, but I don't actually like this patch any more. Over in another thread I proposed that we should just make exit(1) the default behaviour built into latch.c for those cases that don't want to do something special (eg SyncRepWaitForLSN()), so we don't finish up duplicating the ugly exit(1) code everywhere (or worse, forgetting to). Tom Lane seemed to think that was an idea worth pursuing. I think what we need for PG12 is a patch that does that, and also introduces a reused WaitEventSet object in several of these places. Then eg SyncRepWaitForLSN() won't be interacting with contended kernel objects every time (assuming an implementation like epoll or eventually kqueue, completion ports etc is available). Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts() (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based approach where available as suggested by Andres or fall back to polling a reusable WaitEventSet (timeout = 0), then there'd be no more calls to PostmasterIsAlive() outside latch.c.  https://www.postgresql.org/message-id/CAEepm%3D2LqHzizbe7muD7-2yHUbTOoF7Q%2BqkSD5Q41kuhttRTwA%40mail.gmail.com  https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi -- Thomas Munro http://www.enterprisedb.com
Description: Binary data