Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout

Andres Freund Fri, 16 May 2014 16:35:34 -0700

On 2014-05-16 17:02:33 -0400, Steve Singer wrote:
> >I don't think that's going to cut it though. The creation can take
> >longer than whatever wal_sender_timeout is set to (when there's lots of
> >longrunning transactions). I think checking whether last_reply_timestamp
> >= 0 during timeout checking is more robust.


> That makes sense.
> A patch that does that is attached.

I've attached a heavily revised version of that patch. Didn't touch all
the necessary places...

Survives creation of logical replication slots under 'intensive
pressure', with a wal_sender_timeout=10ms.

Thanks for the bugreport!

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

>From 324d74e16dd8ee2a0fa977d92a269fd43a746196 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sat, 17 May 2014 01:22:01 +0200
Subject: [PATCH] Don't pay heed to wal_sender_timeout while creating a
 decoding slot.

Sometimes CREATE_REPLICATION_SLOT ... LOGICAL ... needs to wait for
futher WAL using WalSndWaitForWal(). That used to always respect
wal_sender_timeout and kill the session when waiting long enough
because no feedback/ping messages can be sent while the slot is still
being created.
Add the notion that last_reply_timestamp = 0 means that the walsender
currently doesn't need timeout processing and add checks for it in a
couple of places.

Bugreport and initial patch by Steve Singer, revised by Andres Freund.
---
 src/backend/replication/walsender.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5c11d68..90394ce 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -148,9 +148,10 @@ static StringInfoData reply_message;
 static StringInfoData tmpbuf;
 
 /*
- * Timestamp of the last receipt of the reply from the standby.
+ * Timestamp of the last receipt of the reply from the standby. Set to 0 if a
+ * the process currently shouldn't be killed by wal_sender_timeout.
  */
-static TimestampTz last_reply_timestamp;
+static TimestampTz last_reply_timestamp = 0;
 
 /* Have we sent a heartbeat message asking for reply, since last reply? */
 static bool waiting_for_ping_response = false;
@@ -796,6 +797,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 										logical_read_xlog_page,
 										WalSndPrepareWrite, WalSndWriteData);
 
+		/*
+		 * Signal that we don't need the timeout mechanism. We're just
+		 * creating the replication slot and don't yet accept feedback
+		 * messages or send keepalives. As we possibly need to wait for
+		 * further WAL the walsender would possibly be killed too soon.
+		 */
+		last_reply_timestamp = 0;
+
 		/* build initial snapshot, might take a while */
 		DecodingContextFindStartpoint(ctx);
 
@@ -1693,7 +1702,7 @@ WalSndComputeSleeptime(TimestampTz now)
 {
 	long		sleeptime = 10000;		/* 10 s */
 
-	if (wal_sender_timeout > 0)
+	if (wal_sender_timeout > 0 && last_reply_timestamp > 0)
 	{
 		TimestampTz wakeup_time;
 		long		sec_to_timeout;
@@ -1735,6 +1744,10 @@ WalSndCheckTimeOut(TimestampTz now)
 {
 	TimestampTz timeout;
 
+	/* don't bail out if we're doing something that doesn't require timeouts */
+	if (last_reply_timestamp <= 0)
+		return;
+
 	timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
 										  wal_sender_timeout);
 
@@ -2879,7 +2892,11 @@ WalSndKeepaliveIfNecessary(TimestampTz now)
 {
 	TimestampTz ping_time;
 
-	if (wal_sender_timeout <= 0)
+	/*
+	 * Don't send keepalive messages if timeouts are globally disabled or
+	 * we're doing something not partaking in timeouts.
+	 */
+	if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0)
 		return;
 
 	if (waiting_for_ping_response)
-- 
2.0.0.rc2.4.g1dc51c6.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout

Reply via email to