On 2019-Jul-25, Craig Ringer wrote:

> Patch attached.

Here's a non-broken version of this patch.  I have not done anything
other than reflowing the new comment.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 98eac4afa6995c57c5b8288c126f88b434268dbc Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Tue, 3 Sep 2019 18:17:11 -0400
Subject: [PATCH v2] Fix a delay in PostgreSQL shutdown caused by logical
 replication

Due to a race with WAL writing during shutdown, if logical walsenders were
running then PostgreSQL's shutdown could be delayed by up to
wal_sender_timeout/2 while it waits for the walsenders to shut down. The
walsenders wait for new WAL or end-of-wal which won't come until shutdown so
there's a deadlock. The walsender timeout eventually breaks the deadlock.

The issue was introduced by PostgreSQL 10 commit c6c3334364
"Prevent possibility of panics during shutdown checkpoint".

A logical walsender never enters WALSNDSTATE_STOPPING and allows the
checkpointer to continue shutdown. Instead it exits when it reads
end-of-WAL.  But if it reads the last WAL record written before shutdown
and that record doesn't generate a client network write, it can mark
itself caught up and go to sleep without checking to see if it's been
asked to shut down.

Fix by making sure the logical walsender always checks if it's been asked
to shut down before it allows the walsender main loop to go to sleep.

When this issue happens the walsender(s) can be seen to be sleeping in
WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The
checkpointer will be waiting in WalSndWaitStopping() for the walsenders to
enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster
will be waiting in ServerLoop for the checkpointer to finish the shutdown
checkpoint.
---
 src/backend/replication/walsender.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 23870a25a5..88fad46376 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2858,18 +2858,25 @@ XLogSendLogical(void)
 		 * point, then we're caught up.
 		 */
 		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
-		{
 			WalSndCaughtUp = true;
-
-			/*
-			 * Have WalSndLoop() terminate the connection in an orderly
-			 * manner, after writing out all the pending data.
-			 */
-			if (got_STOPPING)
-				got_SIGUSR2 = true;
-		}
 	}
 
+	/*
+	 * If we've recently sent up to the currently flushed WAL and are shutting
+	 * down, we can safely wrap up by flushing buffers and exchanging CopyDone
+	 * messages. It doesn't matter if more WAL may be written before shutdown
+	 * because no WAL written after replication slots are checkpointed can
+	 * result in invocation of logical decoding hooks and output to the
+	 * client.
+	 *
+	 * We could instead WalSndSetState(WALSNDSTATE_STOPPING) to allow shutdown
+	 * to continue and put the walsender in a state where any unexpected WAL
+	 * records Assert.  But the net effect is the same and this is safer to
+	 * backpatch on customer systems.
+	 */
+	if (got_STOPPING && WalSndCaughtUp)
+		got_SIGUSR2 = true;
+
 	/* Update shared memory status */
 	{
 		WalSnd	   *walsnd = MyWalSnd;
-- 
2.17.1

Reply via email to