I just had a client issue with table bloat that I traced back to a
stale xmin value in a replication slot. xmin value from hot standby
feedback is stored in replication slot and used for vacuum xmin
calculation. If hot standby feedback is turned off while walreceiver
is active then the xmin gets reset by HS feedback message containing
InvalidTransactionId. However, if feedback gets turned off while
standby is shut down this message never gets sent and a stale value
gets left behind in the replication slot holding back vacuum.

The simple fix seems to be to always send out at least one feedback
message on each connect regardless of hot_standby_feedback setting.
Patch attached. Looks like this goes back to version 9.4. It could
conceivably cause issues for replication middleware that does not know
how to handle hot standby feedback messages. Not sure if any exist and
if that is a concern.

A shell script to reproduce the problem is also attached, adjust the
PGPATH variable to your postgres install and run in an empty
directory.

Regards,
Ants Aasma
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index cc3cf7d..31333ec 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1157,7 +1157,9 @@ XLogWalRcvSendReply(bool force, bool requestReply)
  * in case they don't have a watch.
  *
  * If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this standby. We also send this message
+ * on first connect because a previous connection might have set xmin
+ * on a replication slot.
  */
 static void
 XLogWalRcvSendHSFeedback(bool immed)
@@ -1167,7 +1169,7 @@ XLogWalRcvSendHSFeedback(bool immed)
 	uint32		nextEpoch;
 	TransactionId xmin;
 	static TimestampTz sendTime = 0;
-	static bool master_has_standby_xmin = false;
+	static bool master_has_standby_xmin = true;
 
 	/*
 	 * If the user doesn't want status to be reported to the master, be sure
@@ -1192,20 +1194,13 @@ XLogWalRcvSendHSFeedback(bool immed)
 	}
 
 	/*
-	 * If Hot Standby is not yet active there is nothing to send. Check this
-	 * after the interval has expired to reduce number of calls.
-	 */
-	if (!HotStandbyActive())
-	{
-		Assert(!master_has_standby_xmin);
-		return;
-	}
-
-	/*
 	 * Make the expensive call to get the oldest xmin once we are certain
 	 * everything else has been checked.
+	 *
+	 * If Hot Standby is not yet active we reset the xmin value. Check this
+	 * after the interval has expired to reduce number of calls.
 	 */
-	if (hot_standby_feedback)
+	if (hot_standby_feedback && HotStandbyActive())
 		xmin = GetOldestXmin(NULL, false);
 	else
 		xmin = InvalidTransactionId;

Attachment: slot-xmin-not-reset-reproduce.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to