Hi,

I recently found couple of issues with the way initial logical decoding
snapshot is made.

First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged. This in
turn means that the snapbuild will never make snapshot if the situation
occurs. The reason why this didn't bite us much yet is that the
snapbuild will consider all transactions finished if the empty
xl_running_xacts comes which usually happens after a while (but might
not on a consistently busy server).

The fix I came up with this is to extend the mechanism to also consider
all transactions finished when xl_running_xacts which has xmin bigger
then that previous xmax comes. This seems to work pretty well on busy
servers. This fix is attached as
0001-Mark-snapshot-consistent-when-all-running-txes-have.patch. I
believe it should be backpatched all the way to 9.4.


The other issue is performance problem again on busy servers with
initial snapshot. We track transactions for catalog modifications so
that we can do proper catalog time travel for decoding of changes. But
for transactions that were running while we started trying to get
initial consistent snapshot, there is no good way to tell if they did
catalog changes or not so we consider them all as catalog changing. We
make separate historical snapshot for every such transaction. This by
itself is fine, the problem is that current implementation also
considers all the transactions that started after we started watching
for changes but before we reached consistent state to also do catalog
changes even though there we actually do know if they did any catalog
change or not. In practice it means we make snapshots that are not
really necessary and if there was long running transaction for which the
snapshot builder has to wait for then we can create thousands of unused
snapshots which affects performance in bad ways (I've seen the initial
snapshot taking hour because of this).

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 67e902eef33da9e241c079eaed9ab12a88616296 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 10 Dec 2016 21:56:44 +0100
Subject: [PATCH 1/2] Mark snapshot consistent when all running txes have
 finished.

---
 src/backend/replication/logical/snapbuild.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8b59fc5..5836d52 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1213,9 +1213,11 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 * Build catalog decoding snapshot incrementally using information about
 	 * the currently running transactions. There are several ways to do that:
 	 *
-	 * a) There were no running transactions when the xl_running_xacts record
-	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state we were waiting for b) and c).
+	 * a) Either there were no running transactions when the xl_running_xacts
+	 *    record was inserted (we might find this while waiting for b) or c))
+	 *    or the running transactions we've been tracking have all finished
+	 *    by the time the xl_running_xacts was inserted. We can jump to
+	 *    CONSISTENT immediately.
 	 *
 	 * b) Wait for all toplevel transactions that were running to end. We
 	 *	  simply track the number of in-progress toplevel transactions and
@@ -1251,13 +1253,18 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	}
 
 	/*
-	 * a) No transaction were running, we can jump to consistent.
+	 * a) Either no transaction were running or we've been tracking running
+	 *    transactions and the new snapshot has them all finished, we can
+	 *    jump to consistent.
 	 *
-	 * NB: We might have already started to incrementally assemble a snapshot,
-	 * so we need to be careful to deal with that.
+	 * NB: Since we might have already started to incrementally assemble a
+	 * snapshot, we need to be careful to deal with that.
 	 */
-	if (running->xcnt == 0)
+	if (running->xcnt == 0 ||
+		(builder->running.xcnt > 0 &&
+		 running->oldestRunningXid > builder->running.xmax))
 	{
+
 		if (builder->start_decoding_at == InvalidXLogRecPtr ||
 			builder->start_decoding_at <= lsn)
 			/* can decode everything after this */
@@ -1278,10 +1285,16 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 
 		builder->state = SNAPBUILD_CONSISTENT;
 
+		/*
+		 * Give different log detail based on actual state for easier
+		 * debugging.
+		 */
 		ereport(LOG,
 				(errmsg("logical decoding found consistent point at %X/%X",
 						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no running transactions.")));
+				 errdetail(running->xcnt == 0 ?
+						   "There are no running transactions." :
+						   "All running transactions have finished.")));
 
 		return false;
 	}
-- 
2.7.4

From 31ea8f255e531c9327f7a5d55c5c0c756b37739c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 10 Dec 2016 22:22:13 +0100
Subject: [PATCH 2/2] Skip unnecessary snapshot builds

---
 src/backend/replication/logical/snapbuild.c | 38 +++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 5836d52..3cf4829 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -955,6 +955,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	bool		forced_timetravel = false;
 	bool		sub_needs_timetravel = false;
 	bool		top_needs_timetravel = false;
+	bool		build_snapshot = true;
 
 	TransactionId xmax = xid;
 
@@ -976,10 +977,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/*
 		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
 		 * timetravel ones, but we want to be able to export a snapshot when
-		 * we reached consistency.
+		 * we reached consistency so we need to keep track of them.
 		 */
 		forced_timetravel = true;
 		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
+
+		/*
+		 * It is however desirable to skip building new snapshot for
+		 * !SnapBuildTxnIsRunning transactions as otherwise we might end up
+		 * building thousands of unused snapshots on busy servers which can
+		 * be very expensive.
+		 */
+		if (!SnapBuildTxnIsRunning(builder, xid))
+			build_snapshot = false;
 	}
 
 	for (nxact = 0; nxact < nsubxacts; nxact++)
@@ -1069,15 +1079,25 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
 			return;
 
+		/* Make sure we always build snapshot if there is no existing one. */
+		build_snapshot = build_snapshot || !builder->snapshot;
+
 		/*
-		 * Decrease the snapshot builder's refcount of the old snapshot, note
-		 * that it still will be used if it has been handed out to the
-		 * reorderbuffer earlier.
+		 * Decrease the snapshot builder's refcount of the old snapshot if we
+		 * plan to build new one, note that it still will be used if it has
+		 * been handed out to the reorderbuffer earlier.
 		 */
-		if (builder->snapshot)
+		if (builder->snapshot && build_snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		/* Build new snapshot unless asked not to. */
+		if (build_snapshot)
+		{
+			builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+
+			/* refcount of the snapshot builder for the new snapshot */
+			SnapBuildSnapIncRefcount(builder->snapshot);
+		}
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1087,11 +1107,9 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 										 builder->snapshot);
 		}
 
-		/* refcount of the snapshot builder for the new snapshot */
-		SnapBuildSnapIncRefcount(builder->snapshot);
-
 		/* add a new Snapshot to all currently running transactions */
-		SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
+		if (build_snapshot)
+			SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
 	}
 	else
 	{
-- 
2.7.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to