Hi,

Here's a bit more polished version of this patch series. I only propose
0001 and 0002 for eventual commit, the two other bits are just stuff to
help with benchmarking etc.

0001
----
increases the size of the arrays, but uses hard-coded number of groups
(64, so 1024 locks) and leaves everything in PGPROC

0002
----
Allocates that separately from PGPROC, and sets the number based on
max_locks_per_transactions

I think 0001 and 0002 should be in fairly good shape, IMO. There's a
couple cosmetic things that bother me (e.g. the way it Asserts after
each FAST_PATH_LOCK_REL_GROUP seems distracting).

But other than that I think it's fine, so a review / opinions would be
very welcome.


0003
----
Adds a separate GUC to make benchmarking easier (without the impact of
changing the size of the lock table).

I think the agreement is to not have a new GUC, unless it turns out to
be necessary in the future. So 0003 was just to make benchmarking a bit
easier.


0004
----
This was a quick attempt to track the fraction of fast-path locks, and
adding the infrastructure is mostly mechanical thing. But it turns out
it's not quite trivial to track why a lock did not use fast-path. It
might have been because it wouldn't fit, or maybe it's not eligible, or
maybe there's a stronger lock. It's not obvious how to count these to
help with evaluating the number of fast-path slots.


regards

-- 
Tomas Vondra
From 6877dfa7cd94c9f541689d9fe211bdcfaf8bbbdc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Mon, 2 Sep 2024 00:55:13 +0200
Subject: [PATCH v20240905 1/4] Increase the number of fast-path lock slots

The fast-path locking introduced in 9.2 allowed each backend to obtain
up to 16 relation locks, provided the lock is not exclusive etc. If the
backend needs to obtain more locks, it needs to put them into the lock
table in shared memory, which is considerably more expensive.

The limit of 16 entries was always rather low. We need to lock all
relations - not just tables, but also indexes. And for planning we need
to lock all relations that might be used by a query, not just those in
the final plan. So it was common to use all the fast-path slots even
with simple schemas and queries.

But as partitioning gets more widely used, with an ever increasing
number of partitions, this bottleneck is becoming easier to hit.
Especially on large machines with enough memory to keep the queried data
cached, and many cores to cause contention when accessing the shared
lock table.

This patch addresses that by increasing the number of fast-path slots
from 16 to 1024, structuring it as a 16-way set associative cache. The
cache is divided into groups of 16 slots, and each lock is mapped to
exactly one of those groups (by hashing the OID). Entries in each group
are processed by linear search etc.

We could treat the whole array as a single hash table, but that would
degrade as it gets full (the cache is in shared memory, so we can't
resize it easily to keep the load factor low). It would probably also
have worse locality, due to more random access.

If a group is full, we can simply insert the new lock into the shared
lock table. This is the same as for the original code with 16 slots. Of
course, if this happens too often, that reduces the benefit.

To map relids to groups we use trivial hash function of the form

    h(relid) = ((relid * P) mod N)

where P is a hard-coded prime number, and N is the number of groups.
This is fast and works quite well - the main purpose is to map relids to
different groups, so that we don't get "hot groups" while the rest of
the groups are almost empty. If the relids are already spread out, the
hash function is unlikely to group them. If the relids are sequential
(e.g. for tables created by a script), the multiplication will spread
them around.

Note: This hard-codes the number of groups to 64, which means 1024
fast-path locks. This shall be either configurable or even better
adjusted based on some existing GUC.
---
 src/backend/storage/lmgr/lock.c | 148 +++++++++++++++++++++++++++-----
 src/include/storage/proc.h      |   8 +-
 2 files changed, 132 insertions(+), 24 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 83b99a98f08..f41e4a33f06 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -167,7 +167,7 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,23 +184,56 @@ static int	FastPathLocalUseCount = 0;
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Macros to calculate the group and index for a relation.
+ *
+ * The formula is a simple hash function, designed to spread the OIDs a bit,
+ * so that even contiguous values end up in different groups. In most cases
+ * there will be gaps anyway, but the multiplication should help a bit.
+ *
+ * The selected value (49157) is a prime not too close to 2^k, and it's
+ * small enough to not cause overflows (in 64-bit).
+ *
+ * XXX Maybe it'd be easier / cheaper to just do this in 32-bits? If we
+ * did (rel % 100000) or something like that first, that'd be enough to
+ * not wrap around. But even if it wrapped, would that be a problem?
+ */
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+
+/*
+ * Given a lock index (into the per-backend array), calculated using the
+ * FP_LOCK_SLOT_INDEX macro, calculate group and index (within the group).
+ */
+#define FAST_PATH_LOCK_GROUP(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
+#define FAST_PATH_LOCK_INDEX(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
+
+/* Calculate index in the whole per-backend array of lock slots. */
+#define FP_LOCK_SLOT_INDEX(group, index) \
+	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
+	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
+
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -926,7 +959,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1970,6 +2003,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2063,9 +2097,14 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2633,12 +2672,26 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		i,
+				group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2654,7 +2707,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2670,12 +2723,26 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	bool		result = false;
+	uint32		i,
+				group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2685,7 +2752,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2714,7 +2781,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 	for (i = 0; i < ProcGlobal->allProcCount; i++)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+		uint32		j,
+					group;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2739,9 +2807,21 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		/* Which FP group does the lock belong to? */
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
+			uint32		f;
+
+			/* index into the whole per-backend array */
+			f = FP_LOCK_SLOT_INDEX(group, j);
+
+			/* must not overflow the array of all locks for a backend */
+			Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
@@ -2793,13 +2873,26 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		i,
+				group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
@@ -2903,6 +2996,12 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	uint32		group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2957,7 +3056,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 		for (i = 0; i < ProcGlobal->allProcCount; i++)
 		{
 			PGPROC	   *proc = &ProcGlobal->allProcs[i];
-			uint32		f;
+			uint32		j;
 
 			/* A backend never blocks itself */
 			if (proc == MyProc)
@@ -2979,9 +3078,16 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
+				uint32		f;
+
+				/* index into the whole per-backend array */
+				f = FP_LOCK_SLOT_INDEX(group, j);
+
+				/* must not overflow the array of all locks for a backend */
+				Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index deeb06c9e01..845058da9fa 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,8 +83,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -292,7 +293,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
+														 * each fast-path slot */
 	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
-- 
2.46.0

From 9eaa679b5adea3a842eb944927d77f3d447646fe Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Thu, 5 Sep 2024 18:14:09 +0200
Subject: [PATCH v20240905 2/4] Size fast-path slots using
 max_locks_per_transaction

Instead of using a hard-coded value of 64 groups (1024 fast-path slots),
determine the value based on max_locks_per_transaction GUC. This size
is calculated startup, before allocating shared memory.

The default value of max_locks_per_transaction value is 64, which means
4 fast-path groups by default.

The max_locks_per_transaction GUC is the best information about how many
locks to expect per backend, but it's main purpose is to size the shared
lock table. It is often set to an average number of locks needed by a
backend, while some backends may need substantially more locks.

This means fast-path capacity calculated from max_locks_per_transaction
may not be sufficient for those lock-hungry backends, forcing them to
use the shared lock table. If that is a problem, the only solution is to
increase the GUC, even if the capacity of the shared lock table was
already sufficient. That is not free, because each lock in the shared
lock table requires almost 500B.

The assumption is this is not an issue. Either there are only few of
those lock-intensive backends, in which case concurrency when accessing
the shared lock table is not an issue. Or there are enough of them to
actually need a higher max_locks_per_transaction value.

It may turn out we actually need a separate GUC for fast-path locking,
but let's not add one until we're sure that's actually the case.

An alternative approach might be to size the fast-path arrays for a
multiple of max_locks_per_transaction. The cost of adding a fast-path
slot is much lower (only ~5B compared to ~500B for shared lock table),
so this would be cheaper than increasing max_locks_per_transaction. But
it's not clear what multiple of max_locks_per_transaction to use.
---
 src/backend/bootstrap/bootstrap.c   |  2 ++
 src/backend/postmaster/postmaster.c |  5 +++
 src/backend/storage/lmgr/lock.c     | 34 +++++++++++++++------
 src/backend/storage/lmgr/proc.c     | 47 +++++++++++++++++++++++++++++
 src/backend/tcop/postgres.c         |  3 ++
 src/backend/utils/init/postinit.c   | 34 +++++++++++++++++++++
 src/include/miscadmin.h             |  1 +
 src/include/storage/proc.h          | 11 ++++---
 8 files changed, 123 insertions(+), 14 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 7637581a184..ed59dfce893 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,6 +309,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
+	InitializeFastPathLocks();
+
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 96bc1d1cfed..f4a16595d7f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,6 +903,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
+	/*
+	 * Also calculate the size of the fast-path lock arrays in PGPROC.
+	 */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index f41e4a33f06..134cd8a6e34 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
  * might be higher than the real number if another backend has transferred
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
+ *
+ * XXX Allocate a static array of the maximum size. We could have a pointer
+ * and then allocate just the right size to save a couple kB, but that does
+ * not seem worth the extra complexity of having to initialize it etc. This
+ * way it gets initialized automaticaly.
  */
-static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,6 +189,17 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Number of fast-path locks per backend - size of the arrays in PGPROC.
+ * This is set only once during start, before initializing shared memory,
+ * and remains constant after that.
+ *
+ * We set the limit based on max_locks_per_transaction GUC, because that's
+ * the best information about expected number of locks per backend we have.
+ * See InitializeFastPathLocks for details.
+ */
+int			FastPathLockGroupsPerBackend = 0;
+
 /*
  * Macros to calculate the group and index for a relation.
  *
@@ -198,7 +214,7 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
  * did (rel % 100000) or something like that first, that'd be enough to
  * not wrap around. But even if it wrapped, would that be a problem?
  */
-#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
 
 /*
  * Given a lock index (into the per-backend array), calculated using the
@@ -213,7 +229,7 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
 /* Calculate index in the whole per-backend array of lock slots. */
 #define FP_LOCK_SLOT_INDEX(group, index) \
-	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
 	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
 	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
 
@@ -2100,7 +2116,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group >= 0 && group < FastPathLockGroupsPerBackend);
 
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
@@ -2679,7 +2695,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
@@ -2730,7 +2746,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	FastPathLocalUseCounts[group] = 0;
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
@@ -2810,7 +2826,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 		/* Which FP group does the lock belong to? */
 		group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-		Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+		Assert(group < FastPathLockGroupsPerBackend);
 
 		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
@@ -2879,7 +2895,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -3001,7 +3017,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638f..a91b6f8a6c0 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -103,6 +103,8 @@ ProcGlobalShmemSize(void)
 	Size		size = 0;
 	Size		TotalProcs =
 		add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* ProcGlobal */
 	size = add_size(size, sizeof(PROC_HDR));
@@ -113,6 +115,18 @@ ProcGlobalShmemSize(void)
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
+	/*
+	 * fast-path lock arrays
+	 *
+	 * XXX The explicit alignment may not be strictly necessary, as both
+	 * values are already multiples of 8 bytes, which is what MAXALIGN does.
+	 * But better to make that obvious.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
+
 	return size;
 }
 
@@ -162,6 +176,10 @@ InitProcGlobal(void)
 				j;
 	bool		found;
 	uint32		TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
+	char	   *fpPtr,
+			   *fpEndPtr PG_USED_FOR_ASSERTS_ONLY;
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* Create the ProcGlobal shared structure */
 	ProcGlobal = (PROC_HDR *)
@@ -211,12 +229,38 @@ InitProcGlobal(void)
 	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
 	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
+	/*
+	 * Allocate arrays for fast-path locks. Those are variable-length, so
+	 * can't be included in PGPROC. We allocate a separate piece of shared
+	 * memory and then divide that between backends.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
+	MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
+	/* For asserts checking we did not overflow. */
+	fpEndPtr = fpPtr + (TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
 	for (i = 0; i < TotalProcs; i++)
 	{
 		PGPROC	   *proc = &procs[i];
 
 		/* Common initialization for all PGPROCs, regardless of type. */
 
+		/*
+		 * Set the fast-path lock arrays, and move the pointer. We interleave
+		 * the two arrays, to keep at least some locality.
+		 */
+		proc->fpLockBits = (uint64 *) fpPtr;
+		fpPtr += fpLockBitsSize;
+
+		proc->fpRelId = (Oid *) fpPtr;
+		fpPtr += fpRelIdSize;
+
+		Assert(fpPtr <= fpEndPtr);
+
 		/*
 		 * Set up per-PGPROC semaphore, latch, and fpInfoLock.  Prepared xact
 		 * dummy PGPROCs don't need these though - they're never associated
@@ -278,6 +322,9 @@ InitProcGlobal(void)
 		pg_atomic_init_u64(&(proc->waitStart), 0);
 	}
 
+	/* We expect to consume exactly the expected amount of data. */
+	Assert(fpPtr = fpEndPtr);
+
 	/*
 	 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
 	 * processes and prepared transactions.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea1135..f54ae00abca 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,6 +4166,9 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
+	/* Initialize size of fast-path lock cache. */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3b50ce19a2c..1faf756c8d8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,6 +557,40 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
+/*
+ * Initialize the number of fast-path lock slots in PGPROC.
+ *
+ * This must be called after modules have had the chance to alter GUCs in
+ * shared_preload_libraries and before shared memory size is determined.
+ *
+ * The default max_locks_per_xact=64 means 4 groups by default.
+ *
+ * We allow anything between 1 and 1024 groups, with the usual power-of-2
+ * logic. The 1 is the "old" value before allowing multiple groups, 1024
+ * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
+ * 1024 are unlikely to be beneficial - we're likely to hit other
+ * bottlenecks long before that.
+ */
+void
+InitializeFastPathLocks(void)
+{
+	Assert(FastPathLockGroupsPerBackend == 0);
+
+	/* we need at least one group */
+	FastPathLockGroupsPerBackend = 1;
+
+	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+	{
+		/* stop once we exceed max_locks_per_xact */
+		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+			break;
+
+		FastPathLockGroupsPerBackend *= 2;
+	}
+
+	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb9..e26d108a470 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,6 +475,7 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
+extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 845058da9fa..0e55c166529 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,9 +83,11 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_GROUPS_PER_BACKEND	64
+extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
+#define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -293,9 +295,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
-														 * each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64	   *fpLockBits;		/* lock modes held for each fast-path slot */
+	Oid		   *fpRelId;		/* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

From d9f3deaa518a673e4dc8df1ff6e40f47c2637e5e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Thu, 5 Sep 2024 16:52:26 +0200
Subject: [PATCH v20240905 3/4] separate guc to allow benchmarking

---
 src/backend/bootstrap/bootstrap.c   |  2 --
 src/backend/postmaster/postmaster.c |  5 -----
 src/backend/tcop/postgres.c         |  3 ---
 src/backend/utils/init/postinit.c   | 34 -----------------------------
 src/backend/utils/misc/guc_tables.c | 10 +++++++++
 src/include/miscadmin.h             |  1 -
 6 files changed, 10 insertions(+), 45 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ed59dfce893..7637581a184 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,8 +309,6 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
-	InitializeFastPathLocks();
-
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f4a16595d7f..96bc1d1cfed 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,11 +903,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
-	/*
-	 * Also calculate the size of the fast-path lock arrays in PGPROC.
-	 */
-	InitializeFastPathLocks();
-
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f54ae00abca..8bc6bea1135 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,9 +4166,6 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
-	/* Initialize size of fast-path lock cache. */
-	InitializeFastPathLocks();
-
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1faf756c8d8..3b50ce19a2c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,40 +557,6 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
-/*
- * Initialize the number of fast-path lock slots in PGPROC.
- *
- * This must be called after modules have had the chance to alter GUCs in
- * shared_preload_libraries and before shared memory size is determined.
- *
- * The default max_locks_per_xact=64 means 4 groups by default.
- *
- * We allow anything between 1 and 1024 groups, with the usual power-of-2
- * logic. The 1 is the "old" value before allowing multiple groups, 1024
- * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
- * 1024 are unlikely to be beneficial - we're likely to hit other
- * bottlenecks long before that.
- */
-void
-InitializeFastPathLocks(void)
-{
-	Assert(FastPathLockGroupsPerBackend == 0);
-
-	/* we need at least one group */
-	FastPathLockGroupsPerBackend = 1;
-
-	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
-	{
-		/* stop once we exceed max_locks_per_xact */
-		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
-			break;
-
-		FastPathLockGroupsPerBackend *= 2;
-	}
-
-	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
-}
-
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58b..cef6341979f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2788,6 +2788,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"fastpath_lock_groups", PGC_POSTMASTER, LOCK_MANAGEMENT,
+			gettext_noop("Sets the maximum number of locks per transaction."),
+			gettext_noop("number of groups in the fast-path lock array.")
+		},
+		&FastPathLockGroupsPerBackend,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"max_pred_locks_per_transaction", PGC_POSTMASTER, LOCK_MANAGEMENT,
 			gettext_noop("Sets the maximum number of predicate locks per transaction."),
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index e26d108a470..25348e71eb9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,7 +475,6 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
-extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
-- 
2.46.0

From 6fbe413d86ecb1dca6acf939ab06550290ec337b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Tue, 3 Sep 2024 19:27:16 +0200
Subject: [PATCH v20240905 4/4] lock stats

---
 src/backend/catalog/system_views.sql      |   6 +
 src/backend/storage/lmgr/lock.c           |  18 +++
 src/backend/utils/activity/Makefile       |   1 +
 src/backend/utils/activity/pgstat.c       |  19 +++
 src/backend/utils/activity/pgstat_locks.c | 134 ++++++++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c       |  18 +++
 src/include/catalog/pg_proc.dat           |  13 +++
 src/include/pgstat.h                      |  21 +++-
 src/include/utils/pgstat_internal.h       |  22 ++++
 9 files changed, 251 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/utils/activity/pgstat_locks.c

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a18..f5aecf14365 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1134,6 +1134,12 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_locks AS
+    SELECT
+        pg_stat_get_fplocks_num_inserted() AS num_inserted,
+        pg_stat_get_fplocks_num_overflowed() AS num_overflowed,
+        pg_stat_get_fplocks_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_checkpointer AS
     SELECT
         pg_stat_get_checkpointer_num_timed() AS num_timed,
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 134cd8a6e34..ecaf64b614c 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -39,6 +39,7 @@
 #include "access/xlogutils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
@@ -964,6 +965,23 @@ LockAcquireExtended(const LOCKTAG *locktag,
 		log_lock = true;
 	}
 
+	/*
+	 * See if an eligible lock would fit into the fast path cache or not.
+	 * This is not quite correct, for two reasons. Firstly, eligible locks
+	 * may end up requiring a regular lock because of a strong lock being
+	 * held by someone else. Secondly, the count can be a bit stale, if
+	 * some other backend promoted some of our fast-path locks.
+	 *
+	 * XXX Worth counting non-eligible locks too?
+	 */
+	if (EligibleForRelationFastPath(locktag, lockmode))
+	{
+		if (FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
+			++PendingFastPathLockStats.num_inserted;
+		else
+			++PendingFastPathLockStats.num_overflowed;
+	}
+
 	/*
 	 * Attempt to take lock via fast path, if eligible.  But if we remember
 	 * having filled up the fast path array, we don't attempt to make any
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17c..4b595f304d0 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -25,6 +25,7 @@ OBJS = \
 	pgstat_database.o \
 	pgstat_function.o \
 	pgstat_io.o \
+	pgstat_locks.o \
 	pgstat_relation.o \
 	pgstat_replslot.o \
 	pgstat_shmem.o \
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 178b5ef65aa..39475c5915f 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -81,6 +81,7 @@
  * - pgstat_database.c
  * - pgstat_function.c
  * - pgstat_io.c
+ * - pgstat_locks.c
  * - pgstat_relation.c
  * - pgstat_replslot.c
  * - pgstat_slru.c
@@ -446,6 +447,21 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
 	},
+
+	[PGSTAT_KIND_FPLOCKS] = {
+		.name = "fp-locks",
+
+		.fixed_amount = true,
+
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, fplocks),
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, fplocks),
+		.shared_data_off = offsetof(PgStatShared_FastPathLocks, stats),
+		.shared_data_len = sizeof(((PgStatShared_FastPathLocks *) 0)->stats),
+
+		.init_shmem_cb = pgstat_fplocks_init_shmem_cb,
+		.reset_all_cb = pgstat_fplocks_reset_all_cb,
+		.snapshot_cb = pgstat_fplocks_snapshot_cb,
+	},
 };
 
 /*
@@ -739,6 +755,9 @@ pgstat_report_stat(bool force)
 	/* flush SLRU stats */
 	partial_flush |= pgstat_slru_flush(nowait);
 
+	/* flush lock stats */
+	partial_flush |= pgstat_fplocks_flush(nowait);
+
 	last_flush = now;
 
 	/*
diff --git a/src/backend/utils/activity/pgstat_locks.c b/src/backend/utils/activity/pgstat_locks.c
new file mode 100644
index 00000000000..99a5d5259da
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_locks.c
@@ -0,0 +1,134 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_locks.c
+ *	  Implementation of locks statistics.
+ *
+ * This file contains the implementation of lock statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of
+ * statistics.
+ *
+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_locks.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+
+PgStat_FastPathLockStats PendingFastPathLockStats = {0};
+
+
+
+/*
+ * Do we have any locks to report?
+ */
+static bool
+pgstat_have_pending_locks(void)
+{
+	return (PendingFastPathLockStats.num_inserted > 0) ||
+		   (PendingFastPathLockStats.num_overflowed > 0);
+}
+
+
+/*
+ * If nowait is true, this function returns true if the lock could not be
+ * acquired. Otherwise return false.
+ */
+bool
+pgstat_fplocks_flush(bool nowait)
+{
+	PgStatShared_FastPathLocks *stats_shmem = &pgStatLocal.shmem->fplocks;
+
+	Assert(IsUnderPostmaster || !IsPostmasterEnvironment);
+	Assert(pgStatLocal.shmem != NULL &&
+		   !pgStatLocal.shmem->is_shutdown);
+
+	/*
+	 * This function can be called even if nothing at all has happened. Avoid
+	 * taking lock for nothing in that case.
+	 */
+	if (!pgstat_have_pending_locks())
+		return false;
+
+	if (!nowait)
+		LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+	else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE))
+		return true;
+
+#define FPLOCKS_ACC(fld) stats_shmem->stats.fld += PendingFastPathLockStats.fld
+	FPLOCKS_ACC(num_inserted);
+	FPLOCKS_ACC(num_overflowed);
+#undef FPLOCKS_ACC
+
+	LWLockRelease(&stats_shmem->lock);
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingFastPathLockStats, 0, sizeof(PendingFastPathLockStats));
+
+	return false;
+}
+
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * a pointer to the fast-path lock statistics struct.
+ */
+PgStat_FastPathLockStats *
+pgstat_fetch_stat_fplocks(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_FPLOCKS);
+
+	return &pgStatLocal.snapshot.fplocks;
+}
+
+void
+pgstat_fplocks_init_shmem_cb(void *stats)
+{
+	PgStatShared_FastPathLocks *stats_shmem = (PgStatShared_FastPathLocks *) stats;
+
+	LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA);
+}
+
+void
+pgstat_fplocks_reset_all_cb(TimestampTz ts)
+{
+	PgStatShared_FastPathLocks *stats_shmem = &pgStatLocal.shmem->fplocks;
+
+	/* see explanation above PgStatShared_FastPathLocks for the reset protocol */
+	LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+	pgstat_copy_changecounted_stats(&stats_shmem->reset_offset,
+									&stats_shmem->stats,
+									sizeof(stats_shmem->stats),
+									&stats_shmem->changecount);
+	stats_shmem->stats.stat_reset_timestamp = ts;
+	LWLockRelease(&stats_shmem->lock);
+}
+
+void
+pgstat_fplocks_snapshot_cb(void)
+{
+	PgStatShared_FastPathLocks *stats_shmem = &pgStatLocal.shmem->fplocks;
+	PgStat_FastPathLockStats *reset_offset = &stats_shmem->reset_offset;
+	PgStat_FastPathLockStats reset;
+
+	pgstat_copy_changecounted_stats(&pgStatLocal.snapshot.fplocks,
+									&stats_shmem->stats,
+									sizeof(stats_shmem->stats),
+									&stats_shmem->changecount);
+
+	LWLockAcquire(&stats_shmem->lock, LW_SHARED);
+	memcpy(&reset, reset_offset, sizeof(stats_shmem->stats));
+	LWLockRelease(&stats_shmem->lock);
+
+	/* compensate by reset offsets */
+#define FPLOCKS_COMP(fld) pgStatLocal.snapshot.fplocks.fld -= reset.fld;
+	FPLOCKS_COMP(num_inserted);
+	FPLOCKS_COMP(num_overflowed);
+#undef FPLOCKS_COMP
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97dc09ac0d9..dcd4957777d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1261,6 +1261,24 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+Datum
+pg_stat_get_fplocks_num_inserted(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_fplocks()->num_inserted);
+}
+
+Datum
+pg_stat_get_fplocks_num_overflowed(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_fplocks()->num_overflowed);
+}
+
+Datum
+pg_stat_get_fplocks_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_fplocks()->stat_reset_timestamp);
+}
+
 /*
 * When adding a new column to the pg_stat_io view, add a new enum value
 * here above IO_NUM_COLUMNS.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ff5436acacf..242aea463ae 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5986,6 +5986,19 @@
   provolatile => 'v', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_subscription_stats' },
 
+{ oid => '6095', descr => 'statistics: number of acquired fast-path locks',
+  proname => 'pg_stat_get_fplocks_num_inserted', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_fplocks_num_inserted' },
+
+{ oid => '6096', descr => 'statistics: number of not acquired fast-path locks',
+  proname => 'pg_stat_get_fplocks_num_overflowed', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_fplocks_num_overflowed' },
+
+{ oid => '6097', descr => 'statistics: last reset for the fast-path locks',
+  proname => 'pg_stat_get_fplocks_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_fplocks_stat_reset_time' },
+
 { oid => '3163', descr => 'current trigger depth',
   proname => 'pg_trigger_depth', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_trigger_depth' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a1..f66b189f8df 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -57,9 +57,10 @@
 #define PGSTAT_KIND_IO	9
 #define PGSTAT_KIND_SLRU	10
 #define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_FPLOCKS	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
-#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
+#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_FPLOCKS
 #define PGSTAT_KIND_BUILTIN_SIZE (PGSTAT_KIND_BUILTIN_MAX + 1)
 
 /* Custom stats kinds */
@@ -303,6 +304,13 @@ typedef struct PgStat_CheckpointerStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_CheckpointerStats;
 
+typedef struct PgStat_FastPathLockStats
+{
+	PgStat_Counter num_inserted;
+	PgStat_Counter num_overflowed;
+	TimestampTz stat_reset_timestamp;
+} PgStat_FastPathLockStats;
+
 
 /*
  * Types related to counting IO operations
@@ -538,6 +546,10 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern void pgstat_report_bgwriter(void);
 extern PgStat_BgWriterStats *pgstat_fetch_stat_bgwriter(void);
 
+/*
+ * Functions in pgstat_locks.c
+ */
+extern PgStat_FastPathLockStats *pgstat_fetch_stat_fplocks(void);
 
 /*
  * Functions in pgstat_checkpointer.c
@@ -811,4 +823,11 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
 
 
+/*
+ * Variables in pgstat_locks.c
+ */
+
+/* updated directly by fast-path locking */
+extern PGDLLIMPORT PgStat_FastPathLockStats PendingFastPathLockStats;
+
 #endif							/* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 25820cbf0a6..0627983846c 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -340,6 +340,15 @@ typedef struct PgStatShared_BgWriter
 	PgStat_BgWriterStats reset_offset;
 } PgStatShared_BgWriter;
 
+typedef struct PgStatShared_FastPathLocks
+{
+	/* lock protects ->reset_offset as well as stats->stat_reset_timestamp */
+	LWLock		lock;
+	uint32		changecount;
+	PgStat_FastPathLockStats stats;
+	PgStat_FastPathLockStats reset_offset;
+} PgStatShared_FastPathLocks;
+
 typedef struct PgStatShared_Checkpointer
 {
 	/* lock protects ->reset_offset as well as stats->stat_reset_timestamp */
@@ -453,6 +462,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
+	PgStatShared_FastPathLocks fplocks;
 
 	/*
 	 * Custom stats data with fixed-numbered objects, indexed by (PgStat_Kind
@@ -487,6 +497,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_WalStats wal;
 
+	PgStat_FastPathLockStats fplocks;
+
 	/*
 	 * Data in snapshot for custom fixed-numbered statistics, indexed by
 	 * (PgStat_Kind - PGSTAT_KIND_CUSTOM_MIN).  Each entry is allocated in
@@ -704,6 +716,16 @@ extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
 extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
 
 
+/*
+ * Functions in pgstat_locks.c
+ */
+
+extern bool pgstat_fplocks_flush(bool);
+extern void pgstat_fplocks_init_shmem_cb(void *stats);
+extern void pgstat_fplocks_reset_all_cb(TimestampTz ts);
+extern void pgstat_fplocks_snapshot_cb(void);
+
+
 /*
  * Variables in pgstat.c
  */
-- 
2.46.0

Reply via email to