Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-10-18 Thread Craig Ringer
On Thu, 17 Oct 2019 at 21:19, Alvaro Herrera 
wrote:

> On 2019-Sep-26, Alvaro Herrera wrote:
>
> > On 2019-Sep-26, Jeff Janes wrote:
>
> > > Hi Alvaro, does this count as a review?
> >
> > Well, I'm already a second pair of eyes for Craig's code, so I think it
> > does :-)  I would have liked confirmation from Craig that my change
> > looks okay to him too, but maybe we'll have to go without that.
>
> There not being a third pair of eyes, I have pushed this.
>
> Thanks!
>
>
> Thanks.

I'm struggling to keep up with my own threads right now...


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-10-17 Thread Alvaro Herrera
On 2019-Sep-26, Alvaro Herrera wrote:

> On 2019-Sep-26, Jeff Janes wrote:

> > Hi Alvaro, does this count as a review?
> 
> Well, I'm already a second pair of eyes for Craig's code, so I think it
> does :-)  I would have liked confirmation from Craig that my change
> looks okay to him too, but maybe we'll have to go without that.

There not being a third pair of eyes, I have pushed this.

Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-09-26 Thread Alvaro Herrera
On 2019-Sep-26, Jeff Janes wrote:

> On Wed, Sep 11, 2019 at 3:52 PM Alvaro Herrera 
> wrote:
> 
> > Reading over this code, I noticed that the detection of the catch-up
> > state ends up being duplicate code, so I would rework that function as
> > in the attached patch.
> >
> > The naming of those flags (got_SIGUSR2, got_STOPPING) is terrible, but
> > I'm not going to change that in a backpatchable bug fix.
> 
> Hi Alvaro, does this count as a review?

Well, I'm already a second pair of eyes for Craig's code, so I think it
does :-)  I would have liked confirmation from Craig that my change
looks okay to him too, but maybe we'll have to go without that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-09-26 Thread Jeff Janes
On Wed, Sep 11, 2019 at 3:52 PM Alvaro Herrera 
wrote:

>
> Reading over this code, I noticed that the detection of the catch-up
> state ends up being duplicate code, so I would rework that function as
> in the attached patch.
>
> The naming of those flags (got_SIGUSR2, got_STOPPING) is terrible, but
> I'm not going to change that in a backpatchable bug fix.
>

Hi Alvaro, does this count as a review?  And Craig, do you agree with
Alvaro's patch as a replacement for your own?

Thanks,

Jeff


Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-09-11 Thread Alvaro Herrera
On 2019-Sep-03, Alvaro Herrera wrote:

> On 2019-Jul-25, Craig Ringer wrote:
> 
> > Patch attached.
> 
> Here's a non-broken version of this patch.  I have not done anything
> other than reflowing the new comment.

Reading over this code, I noticed that the detection of the catch-up
state ends up being duplicate code, so I would rework that function as
in the attached patch.

The naming of those flags (got_SIGUSR2, got_STOPPING) is terrible, but
I'm not going to change that in a backpatchable bug fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 6721ed3559da2b3781b7a0820f165444e573eaee Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 3 Sep 2019 18:17:11 -0400
Subject: [PATCH v3] Fix a delay in PostgreSQL shutdown caused by logical
 replication

Due to a race with WAL writing during shutdown, if logical walsenders were
running then PostgreSQL's shutdown could be delayed by up to
wal_sender_timeout/2 while it waits for the walsenders to shut down. The
walsenders wait for new WAL or end-of-wal which won't come until shutdown so
there's a deadlock. The walsender timeout eventually breaks the deadlock.

The issue was introduced by PostgreSQL 10 commit c6c3334364
"Prevent possibility of panics during shutdown checkpoint".

A logical walsender never enters WALSNDSTATE_STOPPING and allows the
checkpointer to continue shutdown. Instead it exits when it reads
end-of-WAL.  But if it reads the last WAL record written before shutdown
and that record doesn't generate a client network write, it can mark
itself caught up and go to sleep without checking to see if it's been
asked to shut down.

Fix by making sure the logical walsender always checks if it's been asked
to shut down before it allows the walsender main loop to go to sleep.

When this issue happens the walsender(s) can be seen to be sleeping in
WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The
checkpointer will be waiting in WalSndWaitStopping() for the walsenders to
enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster
will be waiting in ServerLoop for the checkpointer to finish the shutdown
checkpoint.
---
 src/backend/replication/walsender.c | 49 +
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 23870a25a5..1c5380fa7d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1296,7 +1296,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 	int			wakeEvents;
 	static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
 
-
 	/*
 	 * Fast path to avoid acquiring the spinlock in case we already know we
 	 * have enough WAL available. This is particularly interesting if we're
@@ -2814,6 +2813,7 @@ XLogSendLogical(void)
 {
 	XLogRecord *record;
 	char	   *errm;
+	XLogRecPtr	flushPtr;
 
 	/*
 	 * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,11 +2830,15 @@ XLogSendLogical(void)
 	if (errm != NULL)
 		elog(ERROR, "%s", errm);
 
+	/*
+	 * Read current flush point; we'll use it to determine whether we've
+	 * caught up.  Note that logical decoding cannot be used while in
+	 * recovery, so ... XXX what?
+	 */
+	flushPtr = GetFlushRecPtr();
+
 	if (record != NULL)
 	{
-		/* XXX: Note that logical decoding cannot be used while in recovery */
-		XLogRecPtr	flushPtr = GetFlushRecPtr();
-
 		/*
 		 * Note the lack of any call to LagTrackerWrite() which is handled by
 		 * WalSndUpdateProgress which is called by output plugin through
@@ -2843,32 +2847,21 @@ XLogSendLogical(void)
 		LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader);
 
 		sentPtr = logical_decoding_ctx->reader->EndRecPtr;
-
-		/*
-		 * If we have sent a record that is at or beyond the flushed point, we
-		 * have caught up.
-		 */
-		if (sentPtr >= flushPtr)
-			WalSndCaughtUp = true;
 	}
-	else
-	{
-		/*
-		 * If the record we just wanted read is at or beyond the flushed
-		 * point, then we're caught up.
-		 */
-		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
-		{
-			WalSndCaughtUp = true;
 
-			/*
-			 * Have WalSndLoop() terminate the connection in an orderly
-			 * manner, after writing out all the pending data.
-			 */
-			if (got_STOPPING)
-got_SIGUSR2 = true;
-		}
-	}
+	/*
+	 * We might have caught up; set flag if so.
+	 */
+	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
+		WalSndCaughtUp = true;
+
+	/*
+	 * If we're caught up and have been requested to stop, have WalSndLoop()
+	 * terminate the connection in an orderly manner, after writing out all
+	 * the pending data.
+	 */
+	if (WalSndCaughtUp && got_STOPPING)
+		got_SIGUSR2 = true;
 
 	/* Update shared memory status */
 	{
-- 
2.17.1



Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-09-03 Thread Alvaro Herrera
On 2019-Jul-25, Craig Ringer wrote:

> Patch attached.

Here's a non-broken version of this patch.  I have not done anything
other than reflowing the new comment.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 98eac4afa6995c57c5b8288c126f88b434268dbc Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 3 Sep 2019 18:17:11 -0400
Subject: [PATCH v2] Fix a delay in PostgreSQL shutdown caused by logical
 replication

Due to a race with WAL writing during shutdown, if logical walsenders were
running then PostgreSQL's shutdown could be delayed by up to
wal_sender_timeout/2 while it waits for the walsenders to shut down. The
walsenders wait for new WAL or end-of-wal which won't come until shutdown so
there's a deadlock. The walsender timeout eventually breaks the deadlock.

The issue was introduced by PostgreSQL 10 commit c6c3334364
"Prevent possibility of panics during shutdown checkpoint".

A logical walsender never enters WALSNDSTATE_STOPPING and allows the
checkpointer to continue shutdown. Instead it exits when it reads
end-of-WAL.  But if it reads the last WAL record written before shutdown
and that record doesn't generate a client network write, it can mark
itself caught up and go to sleep without checking to see if it's been
asked to shut down.

Fix by making sure the logical walsender always checks if it's been asked
to shut down before it allows the walsender main loop to go to sleep.

When this issue happens the walsender(s) can be seen to be sleeping in
WaitLatchOrSocket in WalSndLoop until woken by the keepalive timeout. The
checkpointer will be waiting in WalSndWaitStopping() for the walsenders to
enter WALSNDSTATE_STOPPING or exit, whichever happens first. The postmaster
will be waiting in ServerLoop for the checkpointer to finish the shutdown
checkpoint.
---
 src/backend/replication/walsender.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 23870a25a5..88fad46376 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2858,18 +2858,25 @@ XLogSendLogical(void)
 		 * point, then we're caught up.
 		 */
 		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
-		{
 			WalSndCaughtUp = true;
-
-			/*
-			 * Have WalSndLoop() terminate the connection in an orderly
-			 * manner, after writing out all the pending data.
-			 */
-			if (got_STOPPING)
-got_SIGUSR2 = true;
-		}
 	}
 
+	/*
+	 * If we've recently sent up to the currently flushed WAL and are shutting
+	 * down, we can safely wrap up by flushing buffers and exchanging CopyDone
+	 * messages. It doesn't matter if more WAL may be written before shutdown
+	 * because no WAL written after replication slots are checkpointed can
+	 * result in invocation of logical decoding hooks and output to the
+	 * client.
+	 *
+	 * We could instead WalSndSetState(WALSNDSTATE_STOPPING) to allow shutdown
+	 * to continue and put the walsender in a state where any unexpected WAL
+	 * records Assert.  But the net effect is the same and this is safer to
+	 * backpatch on customer systems.
+	 */
+	if (got_STOPPING && WalSndCaughtUp)
+		got_SIGUSR2 = true;
+
 	/* Update shared memory status */
 	{
 		WalSnd	   *walsnd = MyWalSnd;
-- 
2.17.1



[PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-07-24 Thread Craig Ringer
Hi folks

I recently tracked down a race in shutdown of logical walsenders that can
cause PostgreSQL shutdown to hang for wal_sender_timeout/2 before it
continues to a normal shutdown. With a long timeout that can be quite
disruptive.

TL;DR: The logical walsender may be signalled to stop, then read the last
WAL record before the shutdown checkpoint is due to be written and go to
sleep. The checkpointer will wait for it to acknowledge the shutdown and
the walsender will wait for new WAL. The deadlock is eventually broken by
the walsender timeout keepalive timer.

Patch attached.

The issue arises from the difference between logical and physical walsender
shutdown as introduced by commit c6c3334364 "Prevent possibility of panics
during shutdown checkpoint". It's probably fairly hard to trigger. I ran
into a case where it happened regularly only because of an unrelated patch
that caused some WAL to be written just before the checkpointer issued
walsender shutdown requests. But it's still a legit bug.

If you hit the issue you'll see that walsender(s) can be seen to be
sleeping in WaitLatchOrSocket in WalSndLoop. They'll keep sleeping until
woken by the keepalive timeout. The checkpointer will be waiting in
WalSndWaitStopping() for the walsenders to enter WALSNDSTATE_STOPPING or
exit, whichever happens first. The postmaster will be waiting in ServerLoop
for the checkpointer to finish the shutdown checkpoint.

The checkpointer waits in WalSndWaitStopping() for all walsenders to either
exit or enter WALSNDSTATE_STOPPING state. Logical walsenders never enter
WALSNDSTATE_STOPPING, they go straight to exiting, so the checkpointer
can't finish WalSndWaitStopping() and write the shutdown checkpoint. A
logical walsender usually notices the shutdown request and exits as soon as
it has flushed all WAL up to the server's flushpoint, while physical
walsenders enter WALSNDSTATE_STOPPING.

But there's a race where a logical walsender may read the final available
record and notice it has caught up - but not notice that it has reached
end-of-WAL and check whether it should exit. This happens on the following
(simplified) code path in XLogSendLogical:

if (record != NULL)
{
XLogRecPtr  flushPtr = GetFlushRecPtr();
LogicalDecodingProcessRecord(...);
sentPtr = ...;
if (sentPtr >= flushPtr)
WalSndCaughtUp = true;// <-- HERE
}

because the test for got_STOPPING that sets got_SIGUSR2 is only on the
other branch where getting a record returns `NULL`; this branch can sleep
before checking if shutdown was requested.

So if the walsender read the last WAL record available, when it's >= the
flush pointer and it already handled the SIGUSR1 latch wakeup for the WAL
write, it might go back to sleep and not wake up until the timeout.

The checkpointer already sent PROCSIG_WALSND_INIT_STOPPING to the
walsenders in the prior WalSndInitStopping() call so the walsender won't be
woken by a signal from the checkpointer. No new WAL will be written because
the walsender just consumed the final record written before the
checkpointer went to sleep, and the checkpointer won't write anything more
until the walsender exits. The client might not be due a keepalive for some
time.The only reason this doesn't turn into a total deadlock is that
keepalive wakeup.

An alternative fix would be to have the logical walsender set
WALSNDSTATE_STOPPING instead of faking got_SIGUSR2, then go to sleep
waiting for more WAL. Logical decoding would need to check if it was
running during shutdown and Assert(...) then ERROR if it saw any WAL
records that result in output plugin calls or snapshot management calls. I
avoided this approach as it's more intrusive and I'm not confident I can
concoct a reliable test to trigger it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
From 559dda09b35870d3630a65cbca682e50343c6f0f Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 25 Jul 2019 09:14:58 +0800
Subject: [PATCH] Fix a delay in PostgreSQL shutdown caused by logical
 replication

Due to a race with WAL writing during shutdown, if logical walsenders were
running then PostgreSQL's shutdown could be delayed by up to
wal_sender_timeout/2 while it waits for the walsenders to shut down. The
walsenders wait for new WAL or end-of-wal which won't come until shutdown so
there's a deadlock. The walsender timeout eventually breaks the deadlock.

The issue was introduced by PostgreSQL 10 commit c6c3334364
"Prevent possibility of panics during shutdown checkpoint".

A logical walsender never enters WALSNDSTATE_STOPPING and allows the
checkpointer to continue shutdown. Instead it exits when it reads end-of-WAL.
But if it reads the last WAL record written before shutdown and that record
doesn't generate a client network write, it can mark itself caught up and go to
sleep without checking to see if it's been asked to shut