On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
> Looks pretty good to me, though I haven't tested it.  I like some of
> the safety valves you put in there, but I don't understand this part

Reworked logic covering all feedback, plus tests, plus docs.

Last comments before commit please.

-- 
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 63c6283..30c33fb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2029,6 +2029,10 @@ SET ENABLE_SEQSCAN TO OFF;
         This parameter can only be set in the <filename>postgresql.conf</>
         file or on the server command line.
        </para>
+       <para>
+        You should also consider setting <varname>hot_standby_feedback</>
+        as an alternative to using this parameter.
+       </para>
       </listitem>
      </varlistentry>
      </variablelist>
@@ -2121,6 +2125,22 @@ SET ENABLE_SEQSCAN TO OFF;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-hot-standby-feedback" xreflabel="hot_standby">
+      <term><varname>hot_standby_feedback</varname> (<type>boolean</type>)</term>
+      <indexterm>
+       <primary><varname>hot_standby_feedback</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Specifies whether or not a hot standby will send feedback to the primary
+        about queries currently executing on the standby. This parameter can
+        be used to eliminate query cancels caused by cleanup records, though
+        it can cause database bloat on the primary for some workloads.
+        The default value is <literal>off</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
    </sect1>
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a892969..6941e67 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1486,23 +1486,6 @@ if (!triggered)
    </para>
 
    <para>
-    The most common reason for conflict between standby queries and WAL replay
-    is <quote>early cleanup</>.  Normally, <productname>PostgreSQL</> allows
-    cleanup of old row versions when there are no transactions that need to
-    see them to ensure correct visibility of data according to MVCC rules.
-    However, this rule can only be applied for transactions executing on the
-    master.  So it is possible that cleanup on the master will remove row
-    versions that are still visible to a transaction on the standby.
-   </para>
-
-   <para>
-    Experienced users should note that both row version cleanup and row version
-    freezing will potentially conflict with standby queries. Running a manual
-    <command>VACUUM FREEZE</> is likely to cause conflicts even on tables with
-    no updated or deleted rows.
-   </para>
-
-   <para>
     Once the delay specified by <varname>max_standby_archive_delay</> or
     <varname>max_standby_streaming_delay</> has been exceeded, conflicting
     queries will be cancelled.  This usually results just in a cancellation
@@ -1529,6 +1512,23 @@ if (!triggered)
    </para>
 
    <para>
+    The most common reason for conflict between standby queries and WAL replay
+    is <quote>early cleanup</>.  Normally, <productname>PostgreSQL</> allows
+    cleanup of old row versions when there are no transactions that need to
+    see them to ensure correct visibility of data according to MVCC rules.
+    However, this rule can only be applied for transactions executing on the
+    master.  So it is possible that cleanup on the master will remove row
+    versions that are still visible to a transaction on the standby.
+   </para>
+
+   <para>
+    Experienced users should note that both row version cleanup and row version
+    freezing will potentially conflict with standby queries. Running a manual
+    <command>VACUUM FREEZE</> is likely to cause conflicts even on tables with
+    no updated or deleted rows.
+   </para>
+
+   <para>
     Users should be clear that tables that are regularly and heavily updated
     on the primary server will quickly cause cancellation of longer running
     queries on the standby. In such cases the setting of a finite value for
@@ -1539,12 +1539,10 @@ if (!triggered)
 
    <para>
     Remedial possibilities exist if the number of standby-query cancellations
-    is found to be unacceptable.  The first option is to connect to the
-    primary server and keep a query active for as long as needed to
-    run queries on the standby. This prevents <command>VACUUM</> from removing
-    recently-dead rows and so cleanup conflicts do not occur.
-    This could be done using <xref linkend="dblink"> and
-    <function>pg_sleep()</>, or via other mechanisms. If you do this, you
+    is found to be unacceptable.  The first option is to set the parameter
+    <varname>hot_standby_feedback</>, which prevents <command>VACUUM</> from
+    removing recently-dead rows and so cleanup conflicts do not occur.
+    If you do this, you
     should note that this will delay cleanup of dead rows on the primary,
     which may result in undesirable table bloat. However, the cleanup
     situation will be no worse than if the standby queries were running
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 3277da8..e23c4e5 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -45,6 +45,7 @@
 #include "replication/walreceiver.h"
 #include "storage/ipc.h"
 #include "storage/pmsignal.h"
+#include "storage/procarray.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -56,6 +57,7 @@ bool		am_walreceiver;
 
 /* GUC variable */
 int			wal_receiver_status_interval;
+bool		hot_standby_feedback;
 
 /* libpqreceiver hooks to these when loaded */
 walrcv_connect_type walrcv_connect = NULL;
@@ -610,10 +612,28 @@ XLogWalRcvSendReply(void)
 	reply_message.apply = GetXLogReplayRecPtr();
 	reply_message.sendTime = now;
 
-	elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X",
+	/*
+	 * Get the OldestXmin and its associated epoch
+	 */
+	if (hot_standby_feedback)
+	{
+		TransactionId	nextXid;
+		uint32			nextEpoch;
+
+		reply_message.xmin = GetOldestXmin(true, false);
+
+		GetNextXidAndEpoch(&nextXid, &nextEpoch);
+		if (nextXid < reply_message.xmin)
+			nextEpoch--;
+		reply_message.epoch = nextEpoch;
+	}
+
+	elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X xmin %u epoch %u",
 				 reply_message.write.xlogid, reply_message.write.xrecoff,
 				 reply_message.flush.xlogid, reply_message.flush.xrecoff,
-				 reply_message.apply.xlogid, reply_message.apply.xrecoff);
+				 reply_message.apply.xlogid, reply_message.apply.xrecoff,
+				 reply_message.xmin,
+				 reply_message.epoch);
 
 	/* Prepend with the message type and send it. */
 	buf[0] = 'r';
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3ad95b4..bf58482 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -53,6 +53,7 @@
 #include "storage/ipc.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
@@ -498,6 +499,7 @@ ProcessStandbyReplyMessage(void)
 	static StringInfoData input_message;
 	StandbyReplyMessage	reply;
 	char msgtype;
+	TransactionId newxmin = InvalidTransactionId;
 
 	initStringInfo(&input_message);
 
@@ -524,10 +526,12 @@ ProcessStandbyReplyMessage(void)
 
 	pq_copymsgbytes(&input_message, (char *) &reply, sizeof(StandbyReplyMessage));
 
-	elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X ",
+	elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X xmin %u epoch %u",
 		 reply.write.xlogid, reply.write.xrecoff,
 		 reply.flush.xlogid, reply.flush.xrecoff,
-		 reply.apply.xlogid, reply.apply.xrecoff);
+		 reply.apply.xlogid, reply.apply.xrecoff,
+		 reply.xmin,
+		 reply.epoch);
 
 	/*
 	 * Update shared state for this WalSender process
@@ -543,6 +547,69 @@ ProcessStandbyReplyMessage(void)
 		walsnd->apply = reply.apply;
 		SpinLockRelease(&walsnd->mutex);
 	}
+
+	/*
+	 * Update the WalSender's proc xmin to allow it to be visible
+	 * to snapshots. This will hold back the removal of dead rows
+	 * and thereby prevent the generation of cleanup conflicts
+	 * on the standby server.
+	 */
+	if (TransactionIdIsValid(reply.xmin))
+	{
+		TransactionId	nextXid;
+		uint32			nextEpoch;
+		bool			epochOK;
+
+		GetNextXidAndEpoch(&nextXid, &nextEpoch);
+
+		/*
+		 * Epoch of oldestXmin should be same as standby or
+		 * if the counter has wrapped, then one less than reply.
+		 */
+		if (reply.xmin <= nextXid)
+		{
+			if (reply.epoch == nextEpoch)
+				epochOK = true;
+		}
+		else
+		{
+			if (nextEpoch > 0 && reply.epoch == nextEpoch - 1)
+				epochOK = true;
+		}
+
+		/*
+		 * Feedback from standby must not go backwards, nor should it go
+		 * forwards further than our most recent xid.
+		 */
+		if (epochOK && TransactionIdPrecedesOrEquals(reply.xmin, nextXid))
+		{
+			if (!TransactionIdIsValid(MyProc->xmin))
+			{
+				TransactionId oldestXmin = GetOldestXmin(true, true);
+
+				if (TransactionIdPrecedes(oldestXmin, reply.xmin))
+					newxmin = reply.xmin;
+				else
+					newxmin = oldestXmin;
+			}
+			else
+			{
+				Assert(TransactionIdIsValid(MyProc->xmin));
+
+				if (TransactionIdPrecedes(MyProc->xmin, reply.xmin))
+					newxmin = reply.xmin;
+				else
+					newxmin = MyProc->xmin; /* stay the same */
+			}
+		}
+	}
+
+	/*
+	 * Grab the ProcArrayLock to set xmin, or invalidate for bad reply
+	 */
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	MyProc->xmin = newxmin;
+	LWLockRelease(ProcArrayLock);
 }
 
 /* Main loop of walsender process */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 470183d..08cf941 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1279,6 +1279,15 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
+		{"hot_standby_feedback", PGC_SIGHUP, WAL_STANDBY_SERVERS,
+			gettext_noop("Allows feedback from a hot standby primary that will avoid query conflicts."),
+			NULL
+		},
+		&hot_standby_feedback,
+		false, NULL, NULL
+	},
+
+	{
 		{"allow_system_table_mods", PGC_POSTMASTER, DEVELOPER_OPTIONS,
 			gettext_noop("Allows modifications of the structure of system tables."),
 			NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 5d31365..3ef6813 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -196,6 +196,7 @@
 
 #hot_standby = off			# "on" allows queries during recovery
 					# (change requires restart)
+#hot_standby_feedback = off	# info from standby to prevent query conflicts
 #max_standby_archive_delay = 30s	# max delay before canceling queries
 					# when reading WAL from archive;
 					# -1 allows indefinite delay
diff --git a/src/include/replication/walprotocol.h b/src/include/replication/walprotocol.h
index 32c4962..b026143 100644
--- a/src/include/replication/walprotocol.h
+++ b/src/include/replication/walprotocol.h
@@ -56,6 +56,15 @@ typedef struct
 	XLogRecPtr	flush;
 	XLogRecPtr	apply;
 
+	/*
+	 * The current xmin and epochfrom the standby, for Hot Standby feedback.
+	 * This may be invalid if the standby-side does not support feedback,
+	 * or Hot Standby is not yet available.
+	 */
+	TransactionId	xmin;
+	uint32			epoch;
+
+
 	/* Sender's system clock at the time of transmission */
 	TimestampTz sendTime;
 } StandbyReplyMessage;
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index aa5bfb7..9137b86 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -18,6 +18,7 @@
 
 extern bool am_walreceiver;
 extern int wal_receiver_status_interval;
+extern bool hot_standby_feedback;
 
 /*
  * MAXCONNINFO: maximum size of a connection string.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to