I just had a client issue with table bloat that I traced back to a stale xmin value in a replication slot. xmin value from hot standby feedback is stored in replication slot and used for vacuum xmin calculation. If hot standby feedback is turned off while walreceiver is active then the xmin gets reset by HS feedback message containing InvalidTransactionId. However, if feedback gets turned off while standby is shut down this message never gets sent and a stale value gets left behind in the replication slot holding back vacuum.
The simple fix seems to be to always send out at least one feedback message on each connect regardless of hot_standby_feedback setting. Patch attached. Looks like this goes back to version 9.4. It could conceivably cause issues for replication middleware that does not know how to handle hot standby feedback messages. Not sure if any exist and if that is a concern. A shell script to reproduce the problem is also attached, adjust the PGPATH variable to your postgres install and run in an empty directory. Regards, Ants Aasma
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index cc3cf7d..31333ec 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1157,7 +1157,9 @@ XLogWalRcvSendReply(bool force, bool requestReply) * in case they don't have a watch. * * If the user disables feedback, send one final message to tell sender - * to forget about the xmin on this standby. + * to forget about the xmin on this standby. We also send this message + * on first connect because a previous connection might have set xmin + * on a replication slot. */ static void XLogWalRcvSendHSFeedback(bool immed) @@ -1167,7 +1169,7 @@ XLogWalRcvSendHSFeedback(bool immed) uint32 nextEpoch; TransactionId xmin; static TimestampTz sendTime = 0; - static bool master_has_standby_xmin = false; + static bool master_has_standby_xmin = true; /* * If the user doesn't want status to be reported to the master, be sure @@ -1192,20 +1194,13 @@ XLogWalRcvSendHSFeedback(bool immed) } /* - * If Hot Standby is not yet active there is nothing to send. Check this - * after the interval has expired to reduce number of calls. - */ - if (!HotStandbyActive()) - { - Assert(!master_has_standby_xmin); - return; - } - - /* * Make the expensive call to get the oldest xmin once we are certain * everything else has been checked. + * + * If Hot Standby is not yet active we reset the xmin value. Check this + * after the interval has expired to reduce number of calls. */ - if (hot_standby_feedback) + if (hot_standby_feedback && HotStandbyActive()) xmin = GetOldestXmin(NULL, false); else xmin = InvalidTransactionId;
slot-xmin-not-reset-reproduce.sh
Description: Bourne shell script
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers