From fb3440742d4c30dcda526eea759d4294975daa27 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev <michail.nikolaev@gmail.com>
Date: Fri, 16 Sep 2022 18:38:02 +0300
Subject: [PATCH v8] Currently, KnownAssignedXidsGetAndSetXmin requires an
 iterative loop through KnownAssignedXids array, including xids marked as
 invalid. Performance impact is especially noticeable in the presence of long
 (few seconds) transactions on primary, high value (few thousands) of
 max_connections and high read workload on standby. Most of the CPU spent on
 looping throw KnownAssignedXids while almost all xid are invalid anyway.
 KnownAssignedXidsCompress removes invalid xid from time to time, but
 performance is still affected.

To increase performance, frequency of running KnownAssignedXidsCompress is increased.
Now it is called for each xid % 64 == 0 (number selected by running benchmarks).
Also, the minimum bound of element to compress (4 * PROCARRAY_MAXPROCS) is removed.

Simon Riggs, with some editorialization by Michail Nikolaev.
---
 src/backend/storage/ipc/procarray.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..af86529dc8 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4617,18 +4617,21 @@ KnownAssignedXidsCompress(bool force)
 	{
 		/*
 		 * If we can choose how much to compress, use a heuristic to avoid
-		 * compressing too often or not often enough.
+		 * compressing too often or not often enough. "Compress" here means
+		 * simply moving the values to the beginning of the array, so is
+		 * not as complex or costly as typical data compression algorithms.
 		 *
-		 * Heuristic is if we have a large enough current spread and less than
-		 * 50% of the elements are currently in use, then compress. This
-		 * should ensure we compress fairly infrequently. We could compress
-		 * less often though the virtual array would spread out more and
-		 * snapshots would become more expensive.
+		 * We would like to put an upper bound on the size of the current
+		 * spread, S, to reduce the number of cachelines that need to be read,
+		 * which is essential to avoid limiting scalability for readers.
+		 * Apply the heuristic that if less than 50% of the elements in current
+		 * spread are in use, then compress. We will likely stray higher than
+		 * this because of the additional heuristic applied in
+		 * KnownAssignedXidsRemoveTree(), but benchmarks show this is ok.
 		 */
 		int			nelements = head - tail;
 
-		if (nelements < 4 * PROCARRAY_MAXPROCS ||
-			nelements < 2 * pArray->numKnownAssignedXids)
+		if (nelements < 2 * pArray->numKnownAssignedXids)
 			return;
 	}
 
@@ -4908,7 +4911,8 @@ KnownAssignedXidsRemove(TransactionId xid)
 
 /*
  * KnownAssignedXidsRemoveTree
- *		Remove xid (if it's not InvalidTransactionId) and all the subxids.
+ *		Remove xid (if it's not InvalidTransactionId) and all the subxids,
+ *		typically run when applying transaction end records.
  *
  * Caller must hold ProcArrayLock in exclusive mode.
  */
@@ -4924,8 +4928,25 @@ KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
 	for (i = 0; i < nsubxids; i++)
 		KnownAssignedXidsRemove(subxids[i]);
 
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
+	/*
+	 * Opportunistically consider whether to compress the array.
+	 *
+	 * Performance results showed that we were doing this too often when
+	 * we considered only the size of the array, so reduce the frequency
+	 * by applying an additional heuristic filter based simply on the
+	 * modulus of the xid itself, to reduce this to every few attempts.
+	 *
+	 * Extensive benchmarking has shown that a frequency of every 64 xids
+	 * works well in large multi-core servers. We might expect this to
+	 * vary somewhat depending upon workload but we have not enough
+	 * information about that to be flexible about that here.
+	 *
+	 * XXX consider running compression in a background worker.
+	 */
+#define KAX_COMPRESS_FREQUENCY	64
+	if (TransactionIdIsValid(xid) &&
+		((int) xid) % KAX_COMPRESS_FREQUENCY == 0)
+		KnownAssignedXidsCompress(false);
 }
 
 /*
