Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

Alvaro Herrera Wed, 11 Sep 2019 12:53:18 -0700

On 2019-Sep-03, Alvaro Herrera wrote:

> On 2019-Jul-25, Craig Ringer wrote:
> 
> > Patch attached.
> 
> Here's a non-broken version of this patch.  I have not done anything
> other than reflowing the new comment.


Reading over this code, I noticed that the detection of the catch-up
state ends up being duplicate code, so I would rework that function as
in the attached patch.

The naming of those flags (got_SIGUSR2, got_STOPPING) is terrible, but
I'm not going to change that in a backpatchable bug fix.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>From 6721ed3559da2b3781b7a0820f165444e573eaee Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Tue, 3 Sep 2019 18:17:11 -0400
Subject: [PATCH v3] Fix a delay in PostgreSQL shutdown caused by logical
 replication

Due to a race with WAL writing during shutdown, if logical walsenders were
running then PostgreSQL's shutdown could be delayed by up to
wal_sender_timeout/2 while it waits for the walsenders to shut down. The
walsenders wait for new WAL or end-of-wal which won't come until shutdown so
there's a deadlock. The walsender timeout eventually breaks the deadlock.

The issue was introduced by PostgreSQL 10 commit c6c3334364
"Prevent possibility of panics during shutdown checkpoint".

A logical walsender never enters WALSNDSTATE_STOPPING and allows the
checkpointer to continue shutdown. Instead it exits when it reads
end-of-WAL.  But if it reads the last WAL record written before shutdown
and that record doesn't generate a client network write, it can mark
itself caught up and go to sleep without checking to see if it's been
asked to shut down.

Fix by making sure the logical walsender always checks if it's been asked
to shut down before it allows the walsender main loop to go to sleep.

When this issue happens the walsender(s) can be seen to be sleeping in
WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The
checkpointer will be waiting in WalSndWaitStopping() for the walsenders to
enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster
will be waiting in ServerLoop for the checkpointer to finish the shutdown
checkpoint.
---
 src/backend/replication/walsender.c | 49 +++++++++++++----------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 23870a25a5..1c5380fa7d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1296,7 +1296,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 	int			wakeEvents;
 	static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
 
-
 	/*
 	 * Fast path to avoid acquiring the spinlock in case we already know we
 	 * have enough WAL available. This is particularly interesting if we're
@@ -2814,6 +2813,7 @@ XLogSendLogical(void)
 {
 	XLogRecord *record;
 	char	   *errm;
+	XLogRecPtr	flushPtr;
 
 	/*
 	 * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,11 +2830,15 @@ XLogSendLogical(void)
 	if (errm != NULL)
 		elog(ERROR, "%s", errm);
 
+	/*
+	 * Read current flush point; we'll use it to determine whether we've
+	 * caught up.  Note that logical decoding cannot be used while in
+	 * recovery, so ... XXX what?
+	 */
+	flushPtr = GetFlushRecPtr();
+
 	if (record != NULL)
 	{
-		/* XXX: Note that logical decoding cannot be used while in recovery */
-		XLogRecPtr	flushPtr = GetFlushRecPtr();
-
 		/*
 		 * Note the lack of any call to LagTrackerWrite() which is handled by
 		 * WalSndUpdateProgress which is called by output plugin through
@@ -2843,32 +2847,21 @@ XLogSendLogical(void)
 		LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader);
 
 		sentPtr = logical_decoding_ctx->reader->EndRecPtr;
-
-		/*
-		 * If we have sent a record that is at or beyond the flushed point, we
-		 * have caught up.
-		 */
-		if (sentPtr >= flushPtr)
-			WalSndCaughtUp = true;
 	}
-	else
-	{
-		/*
-		 * If the record we just wanted read is at or beyond the flushed
-		 * point, then we're caught up.
-		 */
-		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
-		{
-			WalSndCaughtUp = true;
 
-			/*
-			 * Have WalSndLoop() terminate the connection in an orderly
-			 * manner, after writing out all the pending data.
-			 */
-			if (got_STOPPING)
-				got_SIGUSR2 = true;
-		}
-	}
+	/*
+	 * We might have caught up; set flag if so.
+	 */
+	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
+		WalSndCaughtUp = true;
+
+	/*
+	 * If we're caught up and have been requested to stop, have WalSndLoop()
+	 * terminate the connection in an orderly manner, after writing out all
+	 * the pending data.
+	 */
+	if (WalSndCaughtUp && got_STOPPING)
+		got_SIGUSR2 = true;
 
 	/* Update shared memory status */
 	{
-- 
2.17.1

Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

Reply via email to