On Sat, Mar 27, 2021 at 4:52 AM Andrey Borodin <x4...@yandex-team.ru> wrote:
> Some thoughts on HashTable patch:
> 1. Can we allocate bigger hashtable to reduce probability of collisions?

Yeah, good idea, might require some study.

> 2. Can we use specialised hashtable for this case? I'm afraid hash_search() 
> does comparable number of CPU cycles as simple cycle from 0 to 128. We could 
> inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm not 
> insisting on special hash though, just an idea.

I tried really hard to not fall into this rabbit h.... [hack hack
hack], OK, here's a first attempt to use simplehash, Andres's
steampunk macro-based robinhood template that we're already using for
several other things, and murmurhash which is inlineable and
branch-free.  I had to tweak it to support "in-place" creation and
fixed size (in other words, no allocators, for use in shared memory).
Then I was annoyed that I had to add a "status" member to our struct,
so I tried to fix that.  Definitely needs more work to think about
failure modes when running out of memory, how much spare space you
need, etc.

I have not experimented with this much beyond hacking until the tests
pass, but it *should* be more efficient...

> 3. pageno in SlruMappingTableEntry seems to be unused.

It's the key (dynahash uses the first N bytes of your struct as the
key, but in this new simplehash version it's more explicit).
From 5f5d4ed8ae2808766ac1fd48f68602ef530e3833 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Mon, 15 Feb 2021 21:51:56 +0500
Subject: [PATCH v13 1/5] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Author: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Gilles Darold <gilles@darold.net>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86%40yandex-team.ru
---
 doc/src/sgml/config.sgml                      | 137 ++++++++++++++++++
 src/backend/access/transam/clog.c             |   6 +
 src/backend/access/transam/commit_ts.c        |   5 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 +
 src/backend/utils/misc/guc.c                  |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 15 files changed, 261 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..f1112bfa9c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1886,6 +1886,143 @@ include_dir 'conf.d'
        </para>
       </listitem>
      </varlistentry>
+     
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to used to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to used to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+     
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to used to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+     
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to used to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+     
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to used to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+     
+    <varlistentry id="guc-clog-buffers" xreflabel="clog_buffers">
+      <term><varname>clog_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>clog_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to used to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not more than 128 or
+        fewer than 4 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+     
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to be used to cache the cotents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not more than 128 or
+        fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
 
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 6fa4713fb4..0318e8ff59 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -659,6 +659,9 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
+ * If values is configured via GUC - just use given value. Otherwise
+ * apply following euristics.
+ *
  * On larger multi-processor systems, it is possible to have many CLOG page
  * requests in flight at one time which could lead to disk access for CLOG
  * page if the required page is not found in memory.  Testing revealed that we
@@ -675,6 +678,9 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 Size
 CLOGShmemBuffers(void)
 {
+	/* consider 0 and 1 as unset GUC */
+	if (clog_buffers > 1)
+		return clog_buffers;
 	return Min(128, Max(4, NBuffers / 512));
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 268bdba339..0d2632b90e 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -530,7 +530,10 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(16, Max(4, NBuffers / 1024));
+	/* consider 0 and 1 as unset GUC */
+	if (commit_ts_buffers > 1)
+		return commit_ts_buffers;
+	return Min(16, Max(4, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1f9f1a1fa1..21787765e2 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1831,8 +1831,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1848,13 +1848,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_offsets_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f89..785f2520fd 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4b16fb5682..de17f52cd7 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -107,7 +107,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -225,7 +225,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -514,7 +514,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -562,7 +562,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d493aeef0f..b1f4f1651d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -872,7 +872,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1395,7 +1395,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 73e0a672ae..e90275be6c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -148,3 +148,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 8;
+int			multixact_members_buffers = 16;
+int			subtrans_buffers = 32;
+int			notify_buffers = 8;
+int			serial_buffers = 16;
+int			clog_buffers = 0;
+int			commit_ts_buffers = 0;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0c5dc4d3e8..003bc820d2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2305,6 +2305,83 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for MultiXact offsets SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		8, 2, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for MultiXact members SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		16, 2, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for substransactions SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		32, 2, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for asyncronous notifications SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		8, 2, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for predicate locks SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		16, 2, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"clog_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for commit log SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&clog_buffers,
+		0, 0, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for commit timestamps SLRU."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		0, 0, INT_MAX / 2,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b234a6bfe6..1b8515989b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -190,6 +190,15 @@
 					# (change requires restart)
 #backend_flush_after = 0		# measured in pages, 0 disables
 
+# - SLRU Buffers (change requires restart) -
+
+#clog_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 32			# memory for pg_subtrans
+#multixact_offsets_buffers = 8		# memory for pg_multixact/offsets
+#multixact_members_buffers = 16		# memory for pg_multixact/members
+#notify_buffers = 8			# memory for pg_nofity
+#serial_buffers = 16			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # WRITE-AHEAD LOG
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 4bbb035eae..97c0a46376 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index d0ab44ae82..ca0999056e 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 9217f66b91..fa831e3721 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern bool Trace_notify;
 extern volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 013850ac28..9c325b4312 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -162,6 +162,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int clog_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 152b698611..c72779bd88 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern int	max_predicate_locks_per_xact;
 extern int	max_predicate_locks_per_relation;
 extern int	max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
-- 
2.30.1

From 3441027b9ac2f135dea7ec155503be9d6331dfc1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 27 Mar 2021 08:34:58 +1300
Subject: [PATCH v13 2/5] Make simplehash easy to use in shmem.

Allow "in-place" creation of a simplehash hash table of fixed size,
suitable for use in shared memory.  No calling out to allocators, and no
ability to grow.
---
 src/include/lib/simplehash.h | 56 +++++++++++++++++++++++++++++++++---
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 395be1ca9a..32d3fa58fe 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -120,6 +120,7 @@
 #define SH_ALLOCATE SH_MAKE_NAME(allocate)
 #define SH_FREE SH_MAKE_NAME(free)
 #define SH_STAT SH_MAKE_NAME(stat)
+#define SH_ESTIMATE_SIZE SH_MAKE_NAME(estimate_size)
 
 /* internal helper functions (no externally visible prototypes) */
 #define SH_COMPUTE_PARAMETERS SH_MAKE_NAME(compute_parameters)
@@ -153,16 +154,22 @@ typedef struct SH_TYPE
 	/* boundary after which to grow hashtable */
 	uint32		grow_threshold;
 
+#ifndef SH_IN_PLACE
 	/* hash buckets */
 	SH_ELEMENT_TYPE *data;
+#endif
 
-#ifndef SH_RAW_ALLOCATOR
+#if !defined(SH_RAW_ALLOCATOR) && !defined(SH_IN_PLACE)
 	/* memory context to use for allocations */
 	MemoryContext ctx;
 #endif
 
 	/* user defined data, useful for callbacks */
 	void	   *private_data;
+
+#ifdef SH_IN_PLACE
+	SH_ELEMENT_TYPE data[FLEXIBLE_ARRAY_MEMBER];
+#endif
 }			SH_TYPE;
 
 typedef enum SH_STATUS
@@ -182,6 +189,11 @@ typedef struct SH_ITERATOR
 #ifdef SH_RAW_ALLOCATOR
 /* <prefix>_hash <prefix>_create(uint32 nelements, void *private_data) */
 SH_SCOPE	SH_TYPE *SH_CREATE(uint32 nelements, void *private_data);
+#elif defined(SH_IN_PLACE)
+/* size_t <prefix>_estimate_size(uint32 nelements) */
+SH_SCOPE	size_t SH_ESTIMATE_SIZE(uint32 nelements);
+/* void <prefix>_create(<prefix>_hash *place, uint32 nelements, void *private_data) */
+SH_SCOPE	void SH_CREATE(SH_TYPE *place, uint32 nelements, void *private_data);
 #else
 /*
  * <prefix>_hash <prefix>_create(MemoryContext ctx, uint32 nelements,
@@ -191,14 +203,18 @@ SH_SCOPE	SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements,
 							   void *private_data);
 #endif
 
+#ifndef SH_IN_PLACE
 /* void <prefix>_destroy(<prefix>_hash *tb) */
 SH_SCOPE void SH_DESTROY(SH_TYPE * tb);
+#endif
 
 /* void <prefix>_reset(<prefix>_hash *tb) */
 SH_SCOPE void SH_RESET(SH_TYPE * tb);
 
+#ifndef SH_IN_PLACE
 /* void <prefix>_grow(<prefix>_hash *tb) */
 SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize);
+#endif
 
 /* <element> *<prefix>_insert(<prefix>_hash *tb, <key> key, bool *found) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found);
@@ -241,7 +257,7 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb);
 /* generate implementation of the hash table */
 #ifdef SH_DEFINE
 
-#ifndef SH_RAW_ALLOCATOR
+#if !defined(SH_RAW_ALLOCATOR) && !defined(SH_IN_PLACE)
 #include "utils/memutils.h"
 #endif
 
@@ -383,11 +399,13 @@ SH_ENTRY_HASH(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
 #endif
 }
 
+#ifndef SH_IN_PLACE
 /* default memory allocator function */
 static inline void *SH_ALLOCATE(SH_TYPE * type, Size size);
 static inline void SH_FREE(SH_TYPE * type, void *pointer);
+#endif
 
-#ifndef SH_USE_NONDEFAULT_ALLOCATOR
+#if !defined(SH_USE_NONDEFAULT_ALLOCATOR) && !defined(SH_IN_PLACE)
 
 /* default memory allocator function */
 static inline void *
@@ -410,6 +428,22 @@ SH_FREE(SH_TYPE * type, void *pointer)
 
 #endif
 
+#ifdef SH_IN_PLACE
+/*
+ * Compute the amount of memory required for a fixed sized in-place hash table.
+ */
+SH_SCOPE	size_t
+SH_ESTIMATE_SIZE(uint32 nelements)
+{
+	size_t		size;
+
+	size = Max(nelements, 2);
+	size = pg_nextpower2_64(size);
+
+	return offsetof(SH_TYPE, data) + sizeof(SH_ELEMENT_TYPE) * size;
+}
+#endif
+
 /*
  * Create a hash table with enough space for `nelements` distinct members.
  * Memory for the hash table is allocated from the passed-in context.  If
@@ -422,6 +456,9 @@ SH_FREE(SH_TYPE * type, void *pointer)
 #ifdef SH_RAW_ALLOCATOR
 SH_SCOPE	SH_TYPE *
 SH_CREATE(uint32 nelements, void *private_data)
+#elif defined(SH_IN_PLACE)
+SH_SCOPE	void
+SH_CREATE(SH_TYPE *place, uint32 nelements, void *private_data)
 #else
 SH_SCOPE	SH_TYPE *
 SH_CREATE(MemoryContext ctx, uint32 nelements, void *private_data)
@@ -432,6 +469,8 @@ SH_CREATE(MemoryContext ctx, uint32 nelements, void *private_data)
 
 #ifdef SH_RAW_ALLOCATOR
 	tb = SH_RAW_ALLOCATOR(sizeof(SH_TYPE));
+#elif defined(SH_IN_PLACE)
+	tb = place;
 #else
 	tb = MemoryContextAllocZero(ctx, sizeof(SH_TYPE));
 	tb->ctx = ctx;
@@ -443,11 +482,15 @@ SH_CREATE(MemoryContext ctx, uint32 nelements, void *private_data)
 
 	SH_COMPUTE_PARAMETERS(tb, size);
 
+#if defined(SH_IN_PLACE)
+	memset(&tb->data, 0, sizeof(SH_ELEMENT_TYPE) * tb->size);
+#else
 	tb->data = SH_ALLOCATE(tb, sizeof(SH_ELEMENT_TYPE) * tb->size);
-
 	return tb;
+#endif
 }
 
+#ifndef SH_IN_PLACE
 /* destroy a previously created hash table */
 SH_SCOPE void
 SH_DESTROY(SH_TYPE * tb)
@@ -455,6 +498,7 @@ SH_DESTROY(SH_TYPE * tb)
 	SH_FREE(tb, tb->data);
 	pfree(tb);
 }
+#endif
 
 /* reset the contents of a previously created hash table */
 SH_SCOPE void
@@ -464,6 +508,7 @@ SH_RESET(SH_TYPE * tb)
 	tb->members = 0;
 }
 
+#ifndef SH_IN_PLACE
 /*
  * Grow a hash table to at least `newsize` buckets.
  *
@@ -576,6 +621,7 @@ SH_GROW(SH_TYPE * tb, uint32 newsize)
 
 	SH_FREE(tb, olddata);
 }
+#endif
 
 /*
  * This is a separate static inline function, so it can be reliably be inlined
@@ -592,6 +638,7 @@ SH_INSERT_HASH_INTERNAL(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash, bool *found)
 restart:
 	insertdist = 0;
 
+#ifndef SH_IN_PLACE
 	/*
 	 * We do the grow check even if the key is actually present, to avoid
 	 * doing the check inside the loop. This also lets us avoid having to
@@ -614,6 +661,7 @@ restart:
 		SH_GROW(tb, tb->size * 2);
 		/* SH_STAT(tb); */
 	}
+#endif
 
 	/* perform insert, start bucket search at optimal location */
 	data = tb->data;
-- 
2.30.1

From 3bbaf8d558e1e405217e17d1a502ff68556b21e3 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 27 Mar 2021 09:04:56 +1300
Subject: [PATCH v13 3/5] Support intrusive status flag in simplehash.

Before, you had to include a "status" member in the element type, which
simplehash.h could use to detect free space.  Allow the user to specify
a special key value to use instead, for more compact representation.
---
 src/include/lib/simplehash.h | 66 ++++++++++++++++++++++++++----------
 1 file changed, 48 insertions(+), 18 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 32d3fa58fe..05c7ca8a47 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -131,6 +131,8 @@
 #define SH_ENTRY_HASH SH_MAKE_NAME(entry_hash)
 #define SH_INSERT_HASH_INTERNAL SH_MAKE_NAME(insert_hash_internal)
 #define SH_LOOKUP_HASH_INTERNAL SH_MAKE_NAME(lookup_hash_internal)
+#define SH_ENTRY_IS_EMPTY SH_MAKE_NAME(entry_is_empty)
+#define SH_SET_ENTRY_EMPTY SH_MAKE_NAME(set_entry_empty)
 
 /* generate forward declarations necessary to use the hash table */
 #ifdef SH_DECLARE
@@ -172,11 +174,13 @@ typedef struct SH_TYPE
 #endif
 }			SH_TYPE;
 
+#ifndef SH_IS_EMPTY_KEY
 typedef enum SH_STATUS
 {
 	SH_STATUS_EMPTY = 0x00,
 	SH_STATUS_IN_USE = 0x01
 } SH_STATUS;
+#endif
 
 typedef struct SH_ITERATOR
 {
@@ -309,6 +313,26 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb);
 
 #endif
 
+static inline bool
+SH_ENTRY_IS_EMPTY(SH_TYPE * tb, SH_ELEMENT_TYPE * entry)
+{
+#ifdef SH_IS_EMPTY_KEY
+	return SH_IS_EMPTY_KEY(tb, entry->SH_KEY);
+#else
+	return entry->status == SH_STATUS_EMPTY;
+#endif
+}
+
+static inline void
+SH_SET_ENTRY_EMPTY(SH_TYPE * tb, SH_ELEMENT_TYPE *entry)
+{
+#ifdef SH_EMPTY_KEY
+	entry->SH_KEY = SH_EMPTY_KEY(tb);
+#else
+	entry->status = SH_STATUS_EMPTY;
+#endif
+}
+
 /*
  * Compute sizing parameters for hashtable. Called when creating and growing
  * the hashtable.
@@ -483,7 +507,7 @@ SH_CREATE(MemoryContext ctx, uint32 nelements, void *private_data)
 	SH_COMPUTE_PARAMETERS(tb, size);
 
 #if defined(SH_IN_PLACE)
-	memset(&tb->data, 0, sizeof(SH_ELEMENT_TYPE) * tb->size);
+	SH_RESET(tb);
 #else
 	tb->data = SH_ALLOCATE(tb, sizeof(SH_ELEMENT_TYPE) * tb->size);
 	return tb;
@@ -504,7 +528,12 @@ SH_DESTROY(SH_TYPE * tb)
 SH_SCOPE void
 SH_RESET(SH_TYPE * tb)
 {
+#ifdef SH_EMPTY_KEY
+	for (size_t i = 0; i < tb->size; ++i)
+		tb->data[i].SH_KEY = SH_EMPTY_KEY(tb);
+#else
 	memset(tb->data, 0, sizeof(SH_ELEMENT_TYPE) * tb->size);
+#endif
 	tb->members = 0;
 }
 
@@ -675,14 +704,17 @@ restart:
 		SH_ELEMENT_TYPE *entry = &data[curelem];
 
 		/* any empty bucket can directly be used */
-		if (entry->status == SH_STATUS_EMPTY)
+		if (SH_ENTRY_IS_EMPTY(tb, entry))
 		{
 			tb->members++;
 			entry->SH_KEY = key;
 #ifdef SH_STORE_HASH
 			SH_GET_HASH(tb, entry) = hash;
 #endif
+#ifndef SH_IS_EMPTY_KEY
 			entry->status = SH_STATUS_IN_USE;
+#endif
+			Assert(!SH_ENTRY_IS_EMPTY(tb, entry));
 			*found = false;
 			return entry;
 		}
@@ -697,7 +729,7 @@ restart:
 
 		if (SH_COMPARE_KEYS(tb, hash, key, entry))
 		{
-			Assert(entry->status == SH_STATUS_IN_USE);
+			Assert(!SH_ENTRY_IS_EMPTY(tb, entry));
 			*found = true;
 			return entry;
 		}
@@ -721,7 +753,7 @@ restart:
 				emptyelem = SH_NEXT(tb, emptyelem, startelem);
 				emptyentry = &data[emptyelem];
 
-				if (emptyentry->status == SH_STATUS_EMPTY)
+				if (SH_ENTRY_IS_EMPTY(tb, emptyentry))
 				{
 					lastentry = emptyentry;
 					break;
@@ -770,7 +802,10 @@ restart:
 #ifdef SH_STORE_HASH
 			SH_GET_HASH(tb, entry) = hash;
 #endif
+#ifndef SH_IS_EMPTY_KEY
 			entry->status = SH_STATUS_IN_USE;
+#endif
+			Assert(!SH_ENTRY_IS_EMPTY(tb, entry));
 			*found = false;
 			return entry;
 		}
@@ -833,12 +868,8 @@ SH_LOOKUP_HASH_INTERNAL(SH_TYPE * tb, SH_KEY_TYPE key, uint32 hash)
 	{
 		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
 
-		if (entry->status == SH_STATUS_EMPTY)
-		{
+		if (SH_ENTRY_IS_EMPTY(tb, entry))
 			return NULL;
-		}
-
-		Assert(entry->status == SH_STATUS_IN_USE);
 
 		if (SH_COMPARE_KEYS(tb, hash, key, entry))
 			return entry;
@@ -891,11 +922,10 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 	{
 		SH_ELEMENT_TYPE *entry = &tb->data[curelem];
 
-		if (entry->status == SH_STATUS_EMPTY)
+		if (SH_ENTRY_IS_EMPTY(tb, entry))
 			return false;
 
-		if (entry->status == SH_STATUS_IN_USE &&
-			SH_COMPARE_KEYS(tb, hash, key, entry))
+		if (SH_COMPARE_KEYS(tb, hash, key, entry))
 		{
 			SH_ELEMENT_TYPE *lastentry = entry;
 
@@ -917,9 +947,9 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 				curelem = SH_NEXT(tb, curelem, startelem);
 				curentry = &tb->data[curelem];
 
-				if (curentry->status != SH_STATUS_IN_USE)
+				if (SH_ENTRY_IS_EMPTY(tb, curentry))
 				{
-					lastentry->status = SH_STATUS_EMPTY;
+					SH_SET_ENTRY_EMPTY(tb, lastentry);
 					break;
 				}
 
@@ -929,7 +959,7 @@ SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key)
 				/* current is at optimal position, done */
 				if (curoptimal == curelem)
 				{
-					lastentry->status = SH_STATUS_EMPTY;
+					SH_SET_ENTRY_EMPTY(tb, lastentry);
 					break;
 				}
 
@@ -966,7 +996,7 @@ SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 	{
 		SH_ELEMENT_TYPE *entry = &tb->data[i];
 
-		if (entry->status != SH_STATUS_IN_USE)
+		if (SH_ENTRY_IS_EMPTY(tb, entry))
 		{
 			startelem = i;
 			break;
@@ -1027,7 +1057,7 @@ SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 
 		if ((iter->cur & tb->sizemask) == (iter->end & tb->sizemask))
 			iter->done = true;
-		if (elem->status == SH_STATUS_IN_USE)
+		if (!SH_ENTRY_IS_EMPTY(tb, elem))
 		{
 			return elem;
 		}
@@ -1063,7 +1093,7 @@ SH_STAT(SH_TYPE * tb)
 
 		elem = &tb->data[i];
 
-		if (elem->status != SH_STATUS_IN_USE)
+		if (SH_ENTRY_IS_EMPTY(tb, elem))
 			continue;
 
 		hash = SH_ENTRY_HASH(tb, elem);
-- 
2.30.1

From 970504d304af469e494de6ca83123caf1a683ecf Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 25 Mar 2021 10:11:31 +1300
Subject: [PATCH v13 4/5] Add buffer mapping table for SLRUs.

Instead of doing a linear search for the buffer holding a given page
number, use a hash table.

Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86%40yandex-team.ru
---
 src/backend/access/transam/slru.c | 111 +++++++++++++++++++++++++-----
 src/include/access/slru.h         |   2 +
 2 files changed, 96 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 82149ad782..f823c2a0c8 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -58,6 +58,8 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/dynahash.h"
+#include "utils/hsearch.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -79,6 +81,12 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+typedef struct SlruMappingTableEntry
+{
+	int			pageno;
+	int			slotno;
+} SlruMappingTableEntry;
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -146,6 +154,9 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static void	SlruMappingAdd(SlruCtl ctl, int pageno, int slotno);
+static void	SlruMappingRemove(SlruCtl ctl, int pageno);
+static int	SlruMappingFind(SlruCtl ctl, int pageno);
 
 /*
  * Initialization of shared memory
@@ -168,7 +179,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
 
-	return BUFFERALIGN(sz) + BLCKSZ * nslots;
+	return BUFFERALIGN(sz) + BLCKSZ * nslots +
+		hash_estimate_size(nslots, sizeof(SlruMappingTableEntry));
 }
 
 /*
@@ -187,6 +199,9 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  LWLock *ctllock, const char *subdir, int tranche_id,
 			  SyncRequestHandler sync_handler)
 {
+	char		mapping_table_name[SHMEM_INDEX_KEYSIZE];
+	HASHCTL		mapping_table_info;
+	HTAB	   *mapping_table;
 	SlruShared	shared;
 	bool		found;
 
@@ -258,11 +273,21 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	else
 		Assert(found);
 
+	/* Create or find the buffer mapping table. */
+	memset(&mapping_table_info, 0, sizeof(mapping_table_info));
+	mapping_table_info.keysize = sizeof(int);
+	mapping_table_info.entrysize = sizeof(SlruMappingTableEntry);
+	snprintf(mapping_table_name, sizeof(mapping_table_name),
+			 "%s Mapping Table", name);
+	mapping_table = ShmemInitHash(mapping_table_name, nslots, nslots,
+								  &mapping_table_info, HASH_ELEM | HASH_BLOBS);
+
 	/*
 	 * Initialize the unshared control struct, including directory path. We
 	 * assume caller set PagePrecedes.
 	 */
 	ctl->shared = shared;
+	ctl->mapping_table = mapping_table;
 	ctl->sync_handler = sync_handler;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
@@ -289,6 +314,9 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 		   shared->page_number[slotno] == pageno);
 
 	/* Mark the slot as containing this page */
+	if (shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+		SlruMappingRemove(ctl, shared->page_number[slotno]);
+	SlruMappingAdd(ctl, pageno, slotno);
 	shared->page_number[slotno] = pageno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
@@ -362,7 +390,10 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				SlruMappingRemove(ctl, shared->page_number[slotno]);
 				shared->page_status[slotno] = SLRU_PAGE_EMPTY;
+			}
 			else				/* write_in_progress */
 			{
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
@@ -436,6 +467,9 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				!shared->page_dirty[slotno]));
 
 		/* Mark the slot read-busy */
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+			SlruMappingRemove(ctl, shared->page_number[slotno]);
+		SlruMappingAdd(ctl, pageno, slotno);
 		shared->page_number[slotno] = pageno;
 		shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
 		shared->page_dirty[slotno] = false;
@@ -459,7 +493,13 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
 			   !shared->page_dirty[slotno]);
 
-		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
+		if (ok)
+			shared->page_status[slotno] = SLRU_PAGE_VALID;
+		else
+		{
+			SlruMappingRemove(ctl, pageno);
+			shared->page_status[slotno] =  SLRU_PAGE_EMPTY;
+		}
 
 		LWLockRelease(&shared->buffer_locks[slotno].lock);
 
@@ -500,20 +540,20 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	slotno = SlruMappingFind(ctl, pageno);
+	if (slotno >= 0 &&
+		shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
-			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
-		{
-			/* See comments for SlruRecentlyUsed macro */
-			SlruRecentlyUsed(shared, slotno);
+		Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+		Assert(shared->page_number[slotno] == pageno);
 
-			/* update the stats counter of pages found in the SLRU */
-			pgstat_count_slru_page_hit(shared->slru_stats_idx);
+		/* See comments for SlruRecentlyUsed macro */
+		SlruRecentlyUsed(shared, slotno);
 
-			return slotno;
-		}
+		/* update the stats counter of pages found in the SLRU */
+		pgstat_count_slru_page_hit(shared->slru_stats_idx);
+
+		return slotno;
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
@@ -1029,11 +1069,12 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		slotno = SlruMappingFind(ctl, pageno);
+		if (slotno >= 0)
 		{
-			if (shared->page_number[slotno] == pageno &&
-				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
-				return slotno;
+			Assert(shared->page_number[slotno] == pageno);
+			Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+			return slotno;
 		}
 
 		/*
@@ -1266,6 +1307,7 @@ restart:;
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			SlruMappingRemove(ctl, shared->page_number[slotno]);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1348,6 +1390,7 @@ restart:
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			SlruMappingRemove(ctl, shared->page_number[slotno]);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1609,3 +1652,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+static int
+SlruMappingFind(SlruCtl ctl, int pageno)
+{
+	SlruMappingTableEntry *mapping;
+
+	mapping = hash_search(ctl->mapping_table, &pageno, HASH_FIND, NULL);
+	if (mapping)
+		return mapping->slotno;
+
+	return -1;
+}
+
+static void
+SlruMappingAdd(SlruCtl ctl, int pageno, int slotno)
+{
+	SlruMappingTableEntry *mapping;
+	bool		found PG_USED_FOR_ASSERTS_ONLY;
+
+	mapping = hash_search(ctl->mapping_table, &pageno, HASH_ENTER, &found);
+	mapping->slotno = slotno;
+
+	Assert(!found);
+}
+
+static void
+SlruMappingRemove(SlruCtl ctl, int pageno)
+{
+	bool		found PG_USED_FOR_ASSERTS_ONLY;
+
+	hash_search(ctl->mapping_table, &pageno, HASH_REMOVE, &found);
+
+	Assert(found);
+}
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index dd52e8cec7..8aa3efc0ee 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -16,6 +16,7 @@
 #include "access/xlogdefs.h"
 #include "storage/lwlock.h"
 #include "storage/sync.h"
+#include "utils/hsearch.h"
 
 
 /*
@@ -110,6 +111,7 @@ typedef SlruSharedData *SlruShared;
 typedef struct SlruCtlData
 {
 	SlruShared	shared;
+	HTAB	   *mapping_table;
 
 	/*
 	 * Which sync handler function to use when handing sync requests over to
-- 
2.30.1

From ef3c1e45a1f869544280f65d1eeba8544b028735 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 27 Mar 2021 09:07:22 +1300
Subject: [PATCH v13 5/5] fixup: use simplehash instead of dynahash

---
 src/backend/access/transam/slru.c | 41 ++++++++++++++++++++-----------
 src/include/access/slru.h         |  4 ++-
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f823c2a0c8..ba9c0efc81 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -54,12 +54,11 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/xlog.h"
+#include "common/hashfn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
-#include "utils/dynahash.h"
-#include "utils/hsearch.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -85,8 +84,24 @@ typedef struct SlruMappingTableEntry
 {
 	int			pageno;
 	int			slotno;
+	char		status;
 } SlruMappingTableEntry;
 
+/* Instantiate specialized hash table routines. */
+#define SH_PREFIX smte
+#define SH_ELEMENT_TYPE SlruMappingTableEntry
+#define SH_KEY_TYPE int
+#define SH_KEY pageno
+#define SH_HASH_KEY(table, key) murmurhash32(key)
+#define SH_EQUAL(table, a, b) a == b
+#define SH_IS_EMPTY_KEY(table, pageno) pageno == -1
+#define SH_EMPTY_KEY(table) -1
+#define SH_DECLARE
+#define SH_DEFINE
+#define SH_SCOPE static inline
+#define SH_IN_PLACE
+#include "lib/simplehash.h"
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -179,8 +194,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
 
-	return BUFFERALIGN(sz) + BLCKSZ * nslots +
-		hash_estimate_size(nslots, sizeof(SlruMappingTableEntry));
+	return BUFFERALIGN(sz) + BLCKSZ * nslots + smte_estimate_size(nslots);
 }
 
 /*
@@ -200,8 +214,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  SyncRequestHandler sync_handler)
 {
 	char		mapping_table_name[SHMEM_INDEX_KEYSIZE];
-	HASHCTL		mapping_table_info;
-	HTAB	   *mapping_table;
+	smte_hash  *mapping_table;
 	SlruShared	shared;
 	bool		found;
 
@@ -274,13 +287,13 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(found);
 
 	/* Create or find the buffer mapping table. */
-	memset(&mapping_table_info, 0, sizeof(mapping_table_info));
-	mapping_table_info.keysize = sizeof(int);
-	mapping_table_info.entrysize = sizeof(SlruMappingTableEntry);
 	snprintf(mapping_table_name, sizeof(mapping_table_name),
 			 "%s Mapping Table", name);
-	mapping_table = ShmemInitHash(mapping_table_name, nslots, nslots,
-								  &mapping_table_info, HASH_ELEM | HASH_BLOBS);
+	mapping_table = ShmemInitStruct(mapping_table_name,
+									smte_estimate_size(nslots),
+									&found);
+	if (!found)
+		smte_create(mapping_table, nslots, NULL);
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -1658,7 +1671,7 @@ SlruMappingFind(SlruCtl ctl, int pageno)
 {
 	SlruMappingTableEntry *mapping;
 
-	mapping = hash_search(ctl->mapping_table, &pageno, HASH_FIND, NULL);
+	mapping = smte_lookup(ctl->mapping_table, pageno);
 	if (mapping)
 		return mapping->slotno;
 
@@ -1671,7 +1684,7 @@ SlruMappingAdd(SlruCtl ctl, int pageno, int slotno)
 	SlruMappingTableEntry *mapping;
 	bool		found PG_USED_FOR_ASSERTS_ONLY;
 
-	mapping = hash_search(ctl->mapping_table, &pageno, HASH_ENTER, &found);
+	mapping = smte_insert(ctl->mapping_table, pageno, &found);
 	mapping->slotno = slotno;
 
 	Assert(!found);
@@ -1682,7 +1695,7 @@ SlruMappingRemove(SlruCtl ctl, int pageno)
 {
 	bool		found PG_USED_FOR_ASSERTS_ONLY;
 
-	hash_search(ctl->mapping_table, &pageno, HASH_REMOVE, &found);
+	found = smte_delete(ctl->mapping_table, pageno);
 
 	Assert(found);
 }
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 8aa3efc0ee..b3d28ff135 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -104,6 +104,8 @@ typedef struct SlruSharedData
 
 typedef SlruSharedData *SlruShared;
 
+struct smte_hash;
+
 /*
  * SlruCtlData is an unshared structure that points to the active information
  * in shared memory.
@@ -111,7 +113,7 @@ typedef SlruSharedData *SlruShared;
 typedef struct SlruCtlData
 {
 	SlruShared	shared;
-	HTAB	   *mapping_table;
+	struct smte_hash *mapping_table;
 
 	/*
 	 * Which sync handler function to use when handing sync requests over to
-- 
2.30.1

Reply via email to