On 2019-Jul-25, Craig Ringer wrote: > Patch attached.
Here's a non-broken version of this patch. I have not done anything other than reflowing the new comment. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 98eac4afa6995c57c5b8288c126f88b434268dbc Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Tue, 3 Sep 2019 18:17:11 -0400 Subject: [PATCH v2] Fix a delay in PostgreSQL shutdown caused by logical replication Due to a race with WAL writing during shutdown, if logical walsenders were running then PostgreSQL's shutdown could be delayed by up to wal_sender_timeout/2 while it waits for the walsenders to shut down. The walsenders wait for new WAL or end-of-wal which won't come until shutdown so there's a deadlock. The walsender timeout eventually breaks the deadlock. The issue was introduced by PostgreSQL 10 commit c6c3334364 "Prevent possibility of panics during shutdown checkpoint". A logical walsender never enters WALSNDSTATE_STOPPING and allows the checkpointer to continue shutdown. Instead it exits when it reads end-of-WAL. But if it reads the last WAL record written before shutdown and that record doesn't generate a client network write, it can mark itself caught up and go to sleep without checking to see if it's been asked to shut down. Fix by making sure the logical walsender always checks if it's been asked to shut down before it allows the walsender main loop to go to sleep. When this issue happens the walsender(s) can be seen to be sleeping in WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The checkpointer will be waiting in WalSndWaitStopping() for the walsenders to enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster will be waiting in ServerLoop for the checkpointer to finish the shutdown checkpoint. --- src/backend/replication/walsender.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 23870a25a5..88fad46376 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2858,18 +2858,25 @@ XLogSendLogical(void) * point, then we're caught up. */ if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) - { WalSndCaughtUp = true; - - /* - * Have WalSndLoop() terminate the connection in an orderly - * manner, after writing out all the pending data. - */ - if (got_STOPPING) - got_SIGUSR2 = true; - } } + /* + * If we've recently sent up to the currently flushed WAL and are shutting + * down, we can safely wrap up by flushing buffers and exchanging CopyDone + * messages. It doesn't matter if more WAL may be written before shutdown + * because no WAL written after replication slots are checkpointed can + * result in invocation of logical decoding hooks and output to the + * client. + * + * We could instead WalSndSetState(WALSNDSTATE_STOPPING) to allow shutdown + * to continue and put the walsender in a state where any unexpected WAL + * records Assert. But the net effect is the same and this is safer to + * backpatch on customer systems. + */ + if (got_STOPPING && WalSndCaughtUp) + got_SIGUSR2 = true; + /* Update shared memory status */ { WalSnd *walsnd = MyWalSnd; -- 2.17.1