On 2019-Sep-03, Alvaro Herrera wrote: > On 2019-Jul-25, Craig Ringer wrote: > > > Patch attached. > > Here's a non-broken version of this patch. I have not done anything > other than reflowing the new comment.
Reading over this code, I noticed that the detection of the catch-up state ends up being duplicate code, so I would rework that function as in the attached patch. The naming of those flags (got_SIGUSR2, got_STOPPING) is terrible, but I'm not going to change that in a backpatchable bug fix. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 6721ed3559da2b3781b7a0820f165444e573eaee Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Tue, 3 Sep 2019 18:17:11 -0400 Subject: [PATCH v3] Fix a delay in PostgreSQL shutdown caused by logical replication Due to a race with WAL writing during shutdown, if logical walsenders were running then PostgreSQL's shutdown could be delayed by up to wal_sender_timeout/2 while it waits for the walsenders to shut down. The walsenders wait for new WAL or end-of-wal which won't come until shutdown so there's a deadlock. The walsender timeout eventually breaks the deadlock. The issue was introduced by PostgreSQL 10 commit c6c3334364 "Prevent possibility of panics during shutdown checkpoint". A logical walsender never enters WALSNDSTATE_STOPPING and allows the checkpointer to continue shutdown. Instead it exits when it reads end-of-WAL. But if it reads the last WAL record written before shutdown and that record doesn't generate a client network write, it can mark itself caught up and go to sleep without checking to see if it's been asked to shut down. Fix by making sure the logical walsender always checks if it's been asked to shut down before it allows the walsender main loop to go to sleep. When this issue happens the walsender(s) can be seen to be sleeping in WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The checkpointer will be waiting in WalSndWaitStopping() for the walsenders to enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster will be waiting in ServerLoop for the checkpointer to finish the shutdown checkpoint. --- src/backend/replication/walsender.c | 49 +++++++++++++---------------- 1 file changed, 21 insertions(+), 28 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 23870a25a5..1c5380fa7d 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1296,7 +1296,6 @@ WalSndWaitForWal(XLogRecPtr loc) int wakeEvents; static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr; - /* * Fast path to avoid acquiring the spinlock in case we already know we * have enough WAL available. This is particularly interesting if we're @@ -2814,6 +2813,7 @@ XLogSendLogical(void) { XLogRecord *record; char *errm; + XLogRecPtr flushPtr; /* * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to @@ -2830,11 +2830,15 @@ XLogSendLogical(void) if (errm != NULL) elog(ERROR, "%s", errm); + /* + * Read current flush point; we'll use it to determine whether we've + * caught up. Note that logical decoding cannot be used while in + * recovery, so ... XXX what? + */ + flushPtr = GetFlushRecPtr(); + if (record != NULL) { - /* XXX: Note that logical decoding cannot be used while in recovery */ - XLogRecPtr flushPtr = GetFlushRecPtr(); - /* * Note the lack of any call to LagTrackerWrite() which is handled by * WalSndUpdateProgress which is called by output plugin through @@ -2843,32 +2847,21 @@ XLogSendLogical(void) LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader); sentPtr = logical_decoding_ctx->reader->EndRecPtr; - - /* - * If we have sent a record that is at or beyond the flushed point, we - * have caught up. - */ - if (sentPtr >= flushPtr) - WalSndCaughtUp = true; } - else - { - /* - * If the record we just wanted read is at or beyond the flushed - * point, then we're caught up. - */ - if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr()) - { - WalSndCaughtUp = true; - /* - * Have WalSndLoop() terminate the connection in an orderly - * manner, after writing out all the pending data. - */ - if (got_STOPPING) - got_SIGUSR2 = true; - } - } + /* + * We might have caught up; set flag if so. + */ + if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr) + WalSndCaughtUp = true; + + /* + * If we're caught up and have been requested to stop, have WalSndLoop() + * terminate the connection in an orderly manner, after writing out all + * the pending data. + */ + if (WalSndCaughtUp && got_STOPPING) + got_SIGUSR2 = true; /* Update shared memory status */ { -- 2.17.1