Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Vlad Lesin Fri, 15 May 2026 05:29:58 -0700

Hello Michael, thank you for your review.

I am sending a new series of patches that incorporate your code reviewnotes. The version prefix is "vVL2-" to distinguish Andrey's patchesfrom mine.


On 5/11/26 04:58, Michael Paquier wrote:

On Thu, May 07, 2026 at 04:57:28PM +0500, Andrey Borodin wrote:

Done so. I still keep Vlad's injection_point_condition.h though.
It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
now near removable_cutoff() SQL.


Because you want to have your callbacks in a new regress_prockill.c
while being able to reuse the structures.  I don't see why not on
clarity ground.  The creation of injection_point_condition.h could be
done in its own commit, independent of the rest, but that's just me
being pedantic about such matters.  I think that this should be
backpatched anyway, that could make the introduction of new tests
easier.

I moved the creation of injection_point_condition.h into a separatecommit in 0001 patch. Additionally, I noticed that if several processesare waiting on the same injection point, only one of them will beawakened by a single injection_points_wakeup() call. I am not sure ifthis behavior is intentional; please let me know.

For context, MySQL has a similar feature called debug sync points [0],which allows waking up several threads waiting on them. I suppose thissame semantics would be useful for injection points as well to simplifytests. I implemented this behavior in the 0002 patch and utilized it in0003. If you disagree with this approach, I can rewrite the test to usetwo separate injection points for the lock group leader and follower.

+# Two backends form a lock group, then are terminated concurrently. The test
+# uses injection points placed inside ProcKill() to pause both victims there
+# and verify that a freshly forked backend can claim the recycled PGPROC slot
+# without hitting "latch already owned by PID ..." PANIC.

This claim is slightly incorrect here.  If I undo the fix, place the
two INJECTION_POINT macros and re-run the test, the test does not
produce a PANIC.  In order to reproduce a PANIC, it seems that we
would need something more complicated with an extra point after a
DisownLatch().  That is not required to me, still this is misleading
and could be improved, I guess..

Yes, you are right, I fixed the race in 0003 patch with additionalinjection point.

 I think that we should refactor the patch into two pieces,
for clarity:
- Patch 1 refactors the freelist manipulations and introduces the two
new boolean flags, documenting more precily how the freelists are
manipulated and why things are done this way (leader, self, etc.).
- Patch 2, which is about the earlier DisownLatch(), then becomes a
fix of its own.  That's the primary fix we care about, to avoid the
entries to be recycled due to a too-early PGPROC entry marked as
available.

Done. I have split this into two separate patches, 0004 and 0005. Thetest in 0003 fails on patch 0004 and passes on 0005.

Also, I suspect that the test is racy?  For one, I strongly suspect
that you are going to need tweaks similar 011_lock_stats.pl in
test_misc where we rely on some query_until() with psql banners, to
make sure that the waits actually happen..  I got the point about
pg_stat_activity not being something we can rely on here.  That's
something that would be easier to see on an overloaded machine.


The test should no longer be racy following the changes in 0003.

--
Best regards,
Vlad

[0] https://dev.mysql.com/doc/dev/mysql-server/9.6.0/PAGE_DEBUG_SYNC.html

From f03ed5938323b2a7fbddbd60f33ad8d80e3fc58e Mon Sep 17 00:00:00 2001
From: Vlad Lesin <[email protected]>
Date: Fri, 15 May 2026 09:32:54 +0300
Subject: [PATCH 1/5] injection_points: extract InjectionPointCondition into a
 header

Move InjectionPointConditionType and InjectionPointCondition from
injection_points.c into a new injection_point_condition.h, and include
that header from injection_points.c.  No functional change.

The struct is the private_data layout used when attaching an injection
point with a scoping condition: the producer fills it in and passes
its address to InjectionPointAttach(); the registered callback
(injection_wait / injection_error / injection_notice in
injection_points.c) receives the pointer back and consults
injection_point_allowed() to decide whether to fire.  Having a single
header keeps both sides of that layout contract in one place so a
future field addition lands in exactly one source location.
---
 .../injection_point_condition.h               | 37 +++++++++++++++++++
 .../injection_points/injection_points.c       | 26 +------------
 2 files changed, 39 insertions(+), 24 deletions(-)
 create mode 100644 src/test/modules/injection_points/injection_point_condition.h

diff --git a/src/test/modules/injection_points/injection_point_condition.h b/src/test/modules/injection_points/injection_point_condition.h
new file mode 100644
index 00000000000..b469601849b
--- /dev/null
+++ b/src/test/modules/injection_points/injection_point_condition.h
@@ -0,0 +1,37 @@
+/*-------------------------------------------------------------------------
+ *
+ * injection_point_condition.h
+ *		Shared layout for the InjectionPointCondition private_data blob
+ *
+ * Used as private_data when attaching an injection_wait callback to scope
+ * the wait to a specific backend.  Defined in a header so the producers
+ * (injection_points.c's injection_points_attach() and regress_prockill.c's
+ * prockill_attach_injection_wait_pid()) and the consumer
+ * (injection_points.c's injection_wait via injection_point_allowed())
+ * agree on the layout.
+ *
+ * Copyright (c) 2025-2026, PostgreSQL Global Development Group
+ *
+ * src/test/modules/injection_points/injection_point_condition.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef INJECTION_POINT_CONDITION_H
+#define INJECTION_POINT_CONDITION_H
+
+typedef enum InjectionPointConditionType
+{
+	INJ_CONDITION_ALWAYS = 0,	/* always run */
+	INJ_CONDITION_PID,			/* PID restriction */
+} InjectionPointConditionType;
+
+typedef struct InjectionPointCondition
+{
+	/* Type of the condition */
+	InjectionPointConditionType type;
+
+	/* ID of the process where the injection point is allowed to run */
+	int			pid;
+} InjectionPointCondition;
+
+#endif							/* INJECTION_POINT_CONDITION_H */
diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index 0f1af513673..fb7a7d477cc 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -34,36 +34,14 @@
 #include "utils/tuplestore.h"
 #include "utils/wait_event.h"
 
+#include "injection_point_condition.h"
+
 PG_MODULE_MAGIC;
 
 /* Maximum number of waits usable in injection points at once */
 #define INJ_MAX_WAIT	8
 #define INJ_NAME_MAXLEN	64
 
-/*
- * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run,
- * stored as private_data when an injection point is attached, and passed as
- * argument to the callback.
- *
- * If more types of runtime conditions need to be tracked, this structure
- * should be expanded.
- */
-typedef enum InjectionPointConditionType
-{
-	INJ_CONDITION_ALWAYS = 0,	/* always run */
-	INJ_CONDITION_PID,			/* PID restriction */
-} InjectionPointConditionType;
-
-typedef struct InjectionPointCondition
-{
-	/* Type of the condition */
-	InjectionPointConditionType type;
-
-	/* ID of the process where the injection point is allowed to run */
-	int			pid;
-} InjectionPointCondition;
-
 /*
  * List of injection points stored in TopMemoryContext attached
  * locally to this process.
-- 
2.43.0

From 07e7a7fe83540f0918290ae581c72d5a287b70aa Mon Sep 17 00:00:00 2001
From: Vlad Lesin <[email protected]>
Date: Fri, 15 May 2026 00:20:17 +0300
Subject: [PATCH 2/5] injection_points: wake every matching waiter in
 injection_points_wakeup()

injection_points_wakeup() previously bumped only the first matching
slot's counter and broke out of the loop.  ConditionVariableBroadcast
already wakes every CV waiter, but a waiter whose own counter was not
bumped re-checks its slot under inj_state->lock and goes right back
to sleep, so a single wakeup could not release multiple backends
parked on the same point name.

Bump every matching slot's counter in the same critical section, and
preserve the existing "could not find injection point %s to wake up"
error path via a found boolean.  All existing in-tree callers attach
one waiter per name, so waking "all of 1" is observationally
identical to "first of 1" -- no behavior change for them.

The wake-only-first behavior is undocumented.  The function was
introduced by commit 37b369dc67b ("injection_points: Add wait and
wakeup of processes"), and neither that commit's log entry nor the
in-code comments justify it.  Subsequent commits did not revisit
this loop (git log -L on injection_points_wakeup since 37b369dc67b
returns only the introducing commit).  Every in-tree caller happens
to attach exactly one waiter per name, so the limitation has never
been exercised and the regression test added in a following commit
is the first scenario that depends on the corrected semantics.
---
 .../injection_points/injection_points.c       | 23 +++++++++++--------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index fb7a7d477cc..35f01fc360b 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -439,29 +439,32 @@ Datum
 injection_points_wakeup(PG_FUNCTION_ARGS)
 {
 	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
-	int			index = -1;
+	bool		found = false;
 
 	if (inj_state == NULL)
 		injection_init_shmem();
 
-	/* First bump the wait counter for the injection point to wake up */
+	/*
+	 * Bump the wait counter for every slot waiting on this name.
+	 * ConditionVariableBroadcast wakes every CV waiter, but a waiter whose
+	 * own counter was not bumped re-checks its slot and goes back to sleep --
+	 * so we must bump them all to release multiple backends parked on the
+	 * same point name.
+	 */
 	SpinLockAcquire(&inj_state->lock);
 	for (int i = 0; i < INJ_MAX_WAIT; i++)
 	{
 		if (strcmp(name, inj_state->name[i]) == 0)
 		{
-			index = i;
-			break;
+			inj_state->wait_counts[i]++;
+			found = true;
 		}
 	}
-	if (index < 0)
-	{
-		SpinLockRelease(&inj_state->lock);
-		elog(ERROR, "could not find injection point %s to wake up", name);
-	}
-	inj_state->wait_counts[index]++;
 	SpinLockRelease(&inj_state->lock);
 
+	if (!found)
+		elog(ERROR, "could not find injection point %s to wake up", name);
+
 	/* And broadcast the change to the waiters */
 	ConditionVariableBroadcast(&inj_state->wait_point);
 	PG_RETURN_VOID();
-- 
2.43.0

From 10684fed3937d60ccd3738b712c848c7c39a41fc Mon Sep 17 00:00:00 2001
From: Vlad Lesin <[email protected]>
Date: Thu, 14 May 2026 16:27:27 +0300
Subject: [PATCH 3/5] Add regression test for ProcKill lock-group procLatch
 recycle race

Two backends form a lock group with new SQL helpers and are
terminated concurrently.  Two injection points placed inside
ProcKill() let the test pause the victims so that a freshly forked
backend's attempt to claim the recycled PGPROC slot can be observed:
if the leader's procLatch is still owned at the moment the slot is
published on the freelist, the new backend's OwnLatch() PANICs with
"latch already owned by PID ...".

The first injection point, "prockill-after-lockgroup", sits inside
the enclosing MyProc->lockGroupLeader != NULL branch in ProcKill --
so only backends that are part of a lock group (leader or follower)
reach it.  Both victims park there once they have entered their
lock-group teardown.  The new prockill_backend_in_injection() helper
reads PGPROC->wait_event_info from ProcGlobal->allProcs directly to
detect when a victim has parked.

The injection_wait must be attached from a controller session rather
than from a victim, and the attachment must outlive the controller's
own backend exit.  injection_points_attach() registers a
before_shmem_exit cleanup that detaches the point when its caller's
backend exits (and, if called from a victim, before ProcKill ever
runs).  The new prockill_attach_injection_wait() helper calls
InjectionPointAttach() directly, leaving no such hook.

The race is timing-dependent in practice, so a second injection
point is placed in proc.c immediately after the lock-group block,
"prockill-pre-disown-latch".  It is unconditional at the macro level
(any exiting backend traverses it), but only backends matching a
PID-scoped attach actually wait.  The new
prockill_attach_injection_wait_pid() helper attaches a wait scoped to
a specific backend's PID using
InjectionPointCondition{INJ_CONDITION_PID, pid} as private_data; like
the unconditional variant, it bypasses injection_points_attach()'s
before_shmem_exit cleanup hook.

The test attaches the unconditional first point and the PID-scoped
second point (scoped to the leader's PID), then terminates both
victims concurrently.  After both wake from
"prockill-after-lockgroup":

  - the leader reaches "prockill-pre-disown-latch" and parks
    (PID-scoped condition matches MyProcPid);

  - the follower reaches the same point but skips the wait (PID
    does not match), continues through SwitchBackToLocalLatch,
    pgstat_reset_wait_event_storage, MyProc = NULL, the DisownLatch
    of its own latch, and the consolidated freelist push --
    publishing the leader's PGPROC while the leader is still parked.

If the leader's procLatch is still owned at that moment, the next
backend fork PANICs in OwnLatch() -- which the test detects via the
server log.

Meson propagates enable_injection_points into the new TAP test's
environment (matching test_misc / test_aio / test_checksums), so it
does not need an explicit env-var prefix at run time.  The test
expects a fixed server and must land together with the matching
ProcKill fix.  Requires --enable-injection-points.
---
 src/backend/storage/lmgr/proc.c               |  17 ++
 src/test/modules/injection_points/Makefile    |   5 +-
 .../injection_points--1.0.sql                 |  41 +++++
 src/test/modules/injection_points/meson.build |   9 ++
 .../injection_points/regress_prockill.c       | 133 ++++++++++++++++
 .../t/001_prockill_lockgroup.pl               | 146 ++++++++++++++++++
 6 files changed, 350 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/injection_points/regress_prockill.c
 create mode 100644 src/test/modules/injection_points/t/001_prockill_lockgroup.pl

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 1ac25068d62..9ba0d4790fd 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -990,8 +990,25 @@ ProcKill(int code, Datum arg)
 		else if (leader != MyProc)
 			MyProc->lockGroupLeader = NULL;
 		LWLockRelease(leader_lwlock);
+
+		/*
+		 * Test hook for src/test/modules/injection_points.  Gated by the
+		 * enclosing lockGroupLeader != NULL branch, so only backends that are
+		 * part of a lock group (leader or follower) reach it.
+		 */
+		INJECTION_POINT("prockill-after-lockgroup", NULL);
 	}
 
+	/*
+	 * Test hook for src/test/modules/injection_points.  Unconditional: any
+	 * exiting backend reaches it.  Combined with a PID-scoped wait attach, it
+	 * lets a test stall a specific victim between the lock-group block and
+	 * the (post-block, pre-fix) DisownLatch so that the consolidated freelist
+	 * push of the leader's slot by the follower can race against the leader's
+	 * still-owned procLatch.
+	 */
+	INJECTION_POINT("prockill-pre-disown-latch", NULL);
+
 	/*
 	 * Reset MyLatch to the process local one.  This is so that signal
 	 * handlers et al can continue using the latch after the shared latch
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index c01d2fb095c..8a2bdebbbbb 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -4,7 +4,8 @@ MODULE_big = injection_points
 OBJS = \
 	$(WIN32RES) \
 	injection_points.o \
-	regress_injection.o
+	regress_injection.o \
+	regress_prockill.o
 EXTENSION = injection_points
 DATA = injection_points--1.0.sql
 PGFILEDESC = "injection_points - facility for injection points"
@@ -12,6 +13,8 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points hashagg reindex_conc vacuum
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
+TAP_TESTS = 1
+
 ISOLATION = basic \
 	    inplace \
 	    repack \
diff --git a/src/test/modules/injection_points/injection_points--1.0.sql b/src/test/modules/injection_points/injection_points--1.0.sql
index 861c7355d4e..1169e018eb6 100644
--- a/src/test/modules/injection_points/injection_points--1.0.sql
+++ b/src/test/modules/injection_points/injection_points--1.0.sql
@@ -108,3 +108,44 @@ CREATE FUNCTION removable_cutoff(rel regclass)
 RETURNS xid8
 AS 'MODULE_PATHNAME'
 LANGUAGE C CALLED ON NULL INPUT;
+
+--
+-- regress_prockill.c functions
+--
+-- Form a lock group without parallel query so ProcKill lock-group teardown
+-- can be tested with injection points.
+--
+CREATE FUNCTION prockill_become_lock_group_leader()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION prockill_become_lock_group_member(leader_pid int)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+-- Attach injection_wait to point_name with no condition.  Must be called from
+-- a controller session, not from a backend that will itself fire the point
+-- (see source).
+CREATE FUNCTION prockill_attach_injection_wait(point_name text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+-- Attach injection_wait to point_name with INJ_CONDITION_PID scoping it to a
+-- specific backend's PID (the callback returns immediately for any other
+-- backend that fires the point).  Like the unconditional variant above, must
+-- be called from a controller session.
+CREATE FUNCTION prockill_attach_injection_wait_pid(point_name text, target_pid int)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+-- Return true if target_pid is currently waiting at the named injection point,
+-- read directly from its PGPROC slot (works inside ProcKill where
+-- pg_stat_activity and ProcArray are already torn down).
+CREATE FUNCTION prockill_backend_in_injection(target_pid int, point_name text)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 59dba1cb023..626a1a8641c 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -7,6 +7,7 @@ endif
 injection_points_sources = files(
   'injection_points.c',
   'regress_injection.c',
+  'regress_prockill.c',
 )
 
 if host_system == 'windows'
@@ -41,6 +42,14 @@ tests += {
     # The injection points are cluster-wide, so disable installcheck
     'runningcheck': false,
   },
+  'tap': {
+    'env': {
+      'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+    },
+    'tests': [
+      't/001_prockill_lockgroup.pl',
+    ],
+  },
   'isolation': {
     'specs': [
       'basic',
diff --git a/src/test/modules/injection_points/regress_prockill.c b/src/test/modules/injection_points/regress_prockill.c
new file mode 100644
index 00000000000..69a73adcc49
--- /dev/null
+++ b/src/test/modules/injection_points/regress_prockill.c
@@ -0,0 +1,133 @@
+/*-------------------------------------------------------------------------
+ *
+ * regress_prockill.c
+ *		SQL helpers for the ProcKill lock-group TAP test
+ *
+ * Exposes lock-group formation without parallel query and helpers that
+ * observe victim backends while they are inside ProcKill().
+ *
+ * Copyright (c) 2025-2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/injection_points/regress_prockill.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/injection_point.h"
+#include "utils/wait_event.h"
+
+#include "injection_point_condition.h"
+
+/* Read a uint32 field exactly once (matches waitfuncs.c idiom). */
+#define UINT32_ACCESS_ONCE(var)	((uint32) (*((volatile uint32 *) &(var))))
+
+PG_FUNCTION_INFO_V1(prockill_become_lock_group_leader);
+
+Datum
+prockill_become_lock_group_leader(PG_FUNCTION_ARGS)
+{
+	BecomeLockGroupLeader();
+	PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(prockill_become_lock_group_member);
+
+Datum
+prockill_become_lock_group_member(PG_FUNCTION_ARGS)
+{
+	int			leader_pid = PG_GETARG_INT32(0);
+	PGPROC	   *leader;
+
+	leader = BackendPidGetProc(leader_pid);
+	if (leader == NULL)
+		elog(ERROR, "backend with PID %d not found", leader_pid);
+
+	if (!BecomeLockGroupMember(leader, leader_pid))
+		elog(ERROR, "could not join lock group of backend %d", leader_pid);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Attach injection_wait to point_name with no condition.  The caller is
+ * responsible for ensuring the injection point fires only in the contexts of
+ * interest -- in the lock-group teardown test, the point is placed inside a
+ * MyProc->lockGroupLeader != NULL branch in ProcKill().
+ *
+ * Must be called from a controller session, not from a backend that is itself
+ * about to fire the point.  injection_points_attach() registers a
+ * before_shmem_exit(injection_points_cleanup) hook that would detach the
+ * point as soon as the controller's session ends (and, if called from a
+ * victim, before ProcKill ever runs).  Calling InjectionPointAttach()
+ * directly leaves no such hook.
+ */
+PG_FUNCTION_INFO_V1(prockill_attach_injection_wait);
+
+Datum
+prockill_attach_injection_wait(PG_FUNCTION_ARGS)
+{
+	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+
+	InjectionPointAttach(name, "injection_points", "injection_wait", NULL, 0);
+	PG_RETURN_VOID();
+}
+
+/*
+ * Attach injection_wait to point_name with INJ_CONDITION_PID scoping it to a
+ * specific backend's PID.  See injection_point_condition.h for the layout
+ * of the private_data blob.
+ */
+PG_FUNCTION_INFO_V1(prockill_attach_injection_wait_pid);
+
+Datum
+prockill_attach_injection_wait_pid(PG_FUNCTION_ARGS)
+{
+	char	   *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	int			target_pid = PG_GETARG_INT32(1);
+	InjectionPointCondition cond = {INJ_CONDITION_PID, target_pid};
+
+	InjectionPointAttach(name, "injection_points", "injection_wait",
+						 &cond, sizeof(cond));
+	PG_RETURN_VOID();
+}
+
+/*
+ * Return true if the backend with target_pid is currently waiting at the
+ * named injection point, by reading PGPROC->wait_event_info directly.
+ *
+ * At the injection points in ProcKill(), the standard observability surfaces
+ * (pg_stat_activity, BackendPidGetProc) are already unavailable: both
+ * pgstat_beshutdown_hook and RemoveProcFromArray run as on_shmem_exit
+ * callbacks before ProcKill.  The PGPROC slot itself remains intact until
+ * ProcKill zeroes proc->pid near its end, so a direct allProcs scan works.
+ */
+PG_FUNCTION_INFO_V1(prockill_backend_in_injection);
+
+Datum
+prockill_backend_in_injection(PG_FUNCTION_ARGS)
+{
+	int			target_pid = PG_GETARG_INT32(0);
+	char	   *want_name = text_to_cstring(PG_GETARG_TEXT_PP(1));
+	uint32		n = ProcGlobal->allProcCount;
+
+	for (uint32 i = 0; i < n; i++)
+	{
+		PGPROC	   *proc = &ProcGlobal->allProcs[i];
+		const char *ev;
+
+		if (proc->pid != target_pid)
+			continue;
+
+		ev = pgstat_get_wait_event(UINT32_ACCESS_ONCE(proc->wait_event_info));
+		PG_RETURN_BOOL(ev != NULL && strcmp(ev, want_name) == 0);
+	}
+
+	PG_RETURN_BOOL(false);
+}
diff --git a/src/test/modules/injection_points/t/001_prockill_lockgroup.pl b/src/test/modules/injection_points/t/001_prockill_lockgroup.pl
new file mode 100644
index 00000000000..425899ac986
--- /dev/null
+++ b/src/test/modules/injection_points/t/001_prockill_lockgroup.pl
@@ -0,0 +1,146 @@
+# Copyright (c) 2025-2026, PostgreSQL Global Development Group
+#
+# Regression test for the ProcKill lock-group vs. procLatch recycle race.
+#
+# Two backends form a lock group, then are terminated concurrently.  The test
+# uses injection points placed inside ProcKill() to pause both victims there
+# and verify that a freshly forked backend can claim the recycled PGPROC slot
+# without hitting "latch already owned by PID ..." PANIC.
+#
+# Requires --enable-injection-points.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils qw(slurp_file);
+use Test::More;
+use Time::HiRes qw(usleep);
+
+use constant PANIC_RE => qr/PANIC:\s+latch already owned by PID/;
+
+if (($ENV{enable_injection_points} // '') ne 'yes')
+{
+	plan skip_all => 'Injection points not supported by this build';
+}
+
+my $node = PostgreSQL::Test::Cluster->new('prockill_race');
+$node->init;
+$node->append_conf('postgresql.conf',
+	q{shared_preload_libraries = 'injection_points'});
+$node->start;
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+# Form a two-backend lock group.
+my $leader   = $node->background_psql('postgres');
+my $follower = $node->background_psql('postgres');
+
+$leader->query_safe('SELECT prockill_become_lock_group_leader()');
+my $leader_pid   = $leader->query_safe('SELECT pg_backend_pid()');
+my $follower_pid = $follower->query_safe('SELECT pg_backend_pid()');
+$leader_pid   =~ s/\s+//g;
+$follower_pid =~ s/\s+//g;
+
+$follower->query_safe("SELECT prockill_become_lock_group_member($leader_pid)");
+
+# Attach an injection wait from a throwaway controller session, not from
+# $leader/$follower.  injection_points_attach() registers a before_shmem_exit
+# cleanup that would fire when the controller's session ends (and, if called
+# from a victim, before ProcKill ever runs).  prockill_attach_injection_wait()
+# calls InjectionPointAttach() directly, leaving no such hook.
+$node->safe_psql('postgres',
+	"SELECT prockill_attach_injection_wait('prockill-after-lockgroup')");
+
+# Second injection point, scoped to the leader's PID.  This sits between the
+# lock-group block and the post-block DisownLatch in proc.c.  After both
+# victims wake from 'prockill-after-lockgroup', the leader will stall here
+# while its procLatch is still owned, and the follower (whose PID does not
+# match the condition) will run through to the consolidated freelist push,
+# publishing the leader's PGPROC with the latch still in the owned state.
+# This is what turns the otherwise empirical race into a deterministic one
+# on branches where DisownLatch is not yet hoisted to the top of ProcKill.
+$node->safe_psql('postgres',
+	"SELECT prockill_attach_injection_wait_pid('prockill-pre-disown-latch', $leader_pid)");
+
+# Cap the default poll_query_until timeout: if the lock-group fix has
+# regressed the postmaster will PANIC mid-cleanup, and we don't want to
+# spend 180s polling against a dead server.
+local $PostgreSQL::Test::Utils::timeout_default = 5;
+
+# Kill the leader and wait for it to pause inside ProcKill.
+# prockill_backend_in_injection() reads PGPROC->wait_event_info directly
+# because pg_stat_activity and ProcArray are already torn down before ProcKill.
+$node->safe_psql('postgres', "SELECT pg_terminate_backend($leader_pid)");
+my $leader_ok = $node->poll_query_until(
+	'postgres',
+	"SELECT prockill_backend_in_injection($leader_pid, 'prockill-after-lockgroup')"
+);
+
+# Kill the follower.  Once the follower starts cleanup, one of two things
+# happens: the follower parks at the injection point (fix in place), or a
+# probe backend grabs the recycled PGPROC slot and PANICs (fix regressed).
+# On the PANIC path the postmaster crashes, so subsequent SQL probes return
+# nothing but connection errors -- the live query side has no way to signal
+# "the race fired".  Scanning the server log for PANIC_RE is the only
+# positive evidence of the regression; the dual-condition loop below exits
+# as soon as either condition is observable.
+eval {
+	$node->safe_psql('postgres',
+		"SELECT pg_terminate_backend($follower_pid)");
+};
+my $follower_ok = 0;
+my $deadline = time() + 2;
+while (time() < $deadline)
+{
+	last if slurp_file($node->logfile) =~ PANIC_RE;
+	$follower_ok = eval {
+		$node->safe_psql('postgres',
+			"SELECT prockill_backend_in_injection($follower_pid, 'prockill-after-lockgroup')")
+		  eq 't';
+	} // 0;
+	last if $follower_ok;
+	usleep(50_000);
+}
+
+# Release both victims (no-op if the postmaster has already crashed).
+eval {
+	$node->safe_psql('postgres',
+		"SELECT injection_points_wakeup('prockill-after-lockgroup')");
+};
+
+# Release the leader from the second injection point too (the follower never
+# parked there because of the PID-scoped condition).  No-op on crash.
+eval {
+	$node->safe_psql('postgres',
+		"SELECT injection_points_wakeup('prockill-pre-disown-latch')");
+};
+
+# A new backend must be able to claim the recycled PGPROC slot without PANIC.
+my $select_ok = eval { $node->safe_psql('postgres', 'SELECT 1'); 1 };
+
+# Decisive check: scan the server log for the exact PANIC the fix prevents.
+# This turns a flaky/slow failure mode into a deterministic one-line diagnosis.
+my $log_contents = slurp_file($node->logfile);
+ok($log_contents !~ PANIC_RE,
+	'no "latch already owned" PANIC in server log (regression of ProcKill lock-group fix)');
+
+ok($leader_ok && $follower_ok,
+	'leader and follower reached ProcKill injection points');
+ok($select_ok,
+	'new backend claimed recycled PGPROC without latch-recycle PANIC');
+
+eval {
+	$node->safe_psql('postgres',
+		"SELECT injection_points_detach('prockill-after-lockgroup')");
+};
+eval {
+	$node->safe_psql('postgres',
+		"SELECT injection_points_detach('prockill-pre-disown-latch')");
+};
+
+eval { $leader->quit };
+eval { $follower->quit };
+$node->stop('fast', fail_ok => 1);
+
+done_testing();
-- 
2.43.0

From bb71702191757bc086a3e4e2d27529264e074d86 Mon Sep 17 00:00:00 2001
From: Vlad Lesin <[email protected]>
Date: Fri, 15 May 2026 14:27:46 +0300
Subject: [PATCH 4/5] Fix lock-group teardown double-push and leak

This commit fixes two latent bugs in ProcKill()'s lock-group teardown
freelist publication: a double push of the leader's PGPROC that
corrupts the freelist, and a leak of the last follower's PGPROC slot.

ProcKill()'s lock-group teardown had two PGPROC freelist publications
scattered through the function: a follower's inline push of the
leader's PGPROC inside the lock-group block (taken when the follower
is the last group member exiting), and every backend's
lockGroupLeader-NULL-driven self-push at the bottom of the function.
The two were coordinated only by inspection of proc->lockGroupLeader,
which the follower cleared as a side effect of pushing the leader.
This coordination was broken: in the canonical two-backend scenario,

  * the follower cleared leader->lockGroupLeader and pushed the
    leader's PGPROC inline under leader_lwlock;

  * the follower did NOT clear its own proc->lockGroupLeader (that
    clearing happens only on the "else if (leader != MyProc)" branch,
    which the follower skipped);

  * when the leader reached the bottom of ProcKill, it saw
    proc->lockGroupLeader == NULL (the follower cleared it) and
    pushed itself -- a second dlist_push_tail() of the same node onto
    the same freelist;

  * the follower at the bottom saw its own proc->lockGroupLeader
    != NULL (never cleared) and skipped its own push; its slot leaked.

The second push corrupts the dlist's prev/next pointers, after which
dlist_pop_head_node() can return invalid memory or hand the same slot
to two backends.

Replace the inline dlist_push_head() in the lock-group block (the
follower-returns-leader path) and the lockGroupLeader-NULL-driven
self-push that followed it with two booleans -- push_leader and
push_self -- decided under leader_lwlock, and consolidate every
freelist push into a single ProcGlobal->freeProcsLock critical
section at the bottom of the function.

Holding leader_lwlock across the decisions is what makes this
reorganization safe:

(a) The leader's "should I self-push?" choice and a follower's
    "should I push the leader?" choice are both made inside the same
    leader_lwlock critical section, so they cannot interleave.  At
    most one pusher sets each boolean, and every backend's
    proc->lockGroupLeader is cleared whenever the corresponding
    push_self is true.  This eliminates the double-push and the leak.

(b) A follower only sets push_leader when it observes an empty group
    after removing itself, so the leader has already passed its own
    lwlock-held dlist_delete by the time a pusher runs.

This commit's structural change does NOT close the related procLatch
recycle race: the leader-slot push is now deferred from inside the
lock-group block to the consolidated critical section below the
follower's own DisownLatch, so the slot is no longer on the freelist
while the follower is parked at the injection point, yet the
follower's deferred push of the leader slot still races the leader's
own DisownLatch, which lives in the leader's process and is
serialized against the follower's push only by unrelated locks.  The
following commit hoists DisownLatch and closes that window.
---
 src/backend/storage/lmgr/proc.c | 92 +++++++++++++++++++++++----------
 1 file changed, 66 insertions(+), 26 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 9ba0d4790fd..7201ad9caa8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -924,7 +924,10 @@ static void
 ProcKill(int code, Datum arg)
 {
 	PGPROC	   *proc;
+	PGPROC	   *leader;
 	dlist_head *procgloballist;
+	bool		push_leader;
+	bool		push_self;
 
 	Assert(MyProc != NULL);
 
@@ -960,35 +963,74 @@ ProcKill(int code, Datum arg)
 	/* Cancel any pending condition variable sleep, too */
 	ConditionVariableCancelSleep();
 
+	proc = MyProc;
+	procgloballist = proc->procgloballist;
+
 	/*
-	 * Detach from any lock group of which we are a member.  If the leader
-	 * exits before all other group members, its PGPROC will remain allocated
-	 * until the last group process exits; that process must return the
-	 * leader's PGPROC to the appropriate list.
+	 * Detach from any lock group of which we are a member, deciding under
+	 * leader_lwlock whether we (and possibly the leader) need to be pushed
+	 * onto a freelist.  The actual pushes happen below, under
+	 * ProcGlobal->freeProcsLock.
+	 *
+	 * Holding leader_lwlock across the decisions is what makes this safe:
+	 *
+	 * (a) The leader's "should I self-push?" choice and a follower's "should
+	 * I push the leader?" choice are both made inside the same leader_lwlock
+	 * critical section, so they cannot interleave.  (That interleaving was
+	 * the source of a double push.)
+	 *
+	 * (b) A follower only sets push_leader when it observes an empty group
+	 * after removing itself, so the leader has already passed its own
+	 * lwlock-held dlist_delete by the time a pusher runs.
+	 *
+	 * NOTE: this commit does NOT establish the latch-ownership invariant that
+	 * the freelist pushes must follow DisownLatch -- see the next commit for
+	 * that.
 	 */
-	if (MyProc->lockGroupLeader != NULL)
+	push_leader = false;
+	push_self = true;
+	leader = NULL;
+
+	if (proc->lockGroupLeader != NULL)
 	{
-		PGPROC	   *leader = MyProc->lockGroupLeader;
-		LWLock	   *leader_lwlock = LockHashPartitionLockByProc(leader);
+		LWLock	   *leader_lwlock;
+
+		leader = proc->lockGroupLeader;
+		leader_lwlock = LockHashPartitionLockByProc(leader);
 
 		LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);
 		Assert(!dlist_is_empty(&leader->lockGroupMembers));
-		dlist_delete(&MyProc->lockGroupLink);
+		dlist_delete(&proc->lockGroupLink);
 		if (dlist_is_empty(&leader->lockGroupMembers))
 		{
 			leader->lockGroupLeader = NULL;
-			if (leader != MyProc)
+			if (leader != proc)
 			{
-				procgloballist = leader->procgloballist;
-
-				/* Leader exited first; return its PGPROC. */
-				SpinLockAcquire(&ProcGlobal->freeProcsLock);
-				dlist_push_head(procgloballist, &leader->freeProcsLink);
-				SpinLockRelease(&ProcGlobal->freeProcsLock);
+				/*
+				 * We are the last follower and the leader exited earlier; its
+				 * PGPROC is still allocated and must be pushed here.
+				 */
+				push_leader = true;
+				proc->lockGroupLeader = NULL;
 			}
+			/* else: leader == proc; the clear above covered us */
+		}
+		else if (leader != proc)
+		{
+			/* Non-last follower; leader still present in the group. */
+			proc->lockGroupLeader = NULL;
+		}
+		else
+		{
+			/*
+			 * We are the leader and followers remain.  Skip our own push; the
+			 * last follower to exit will push us.  Leave
+			 * proc->lockGroupLeader set until that point so InitProcess's
+			 * freshly-picked-up invariant (lockGroupLeader == NULL) is
+			 * re-established only when the PGPROC is actually free.
+			 */
+			push_self = false;
 		}
-		else if (leader != MyProc)
-			MyProc->lockGroupLeader = NULL;
 		LWLockRelease(leader_lwlock);
 
 		/*
@@ -1021,7 +1063,6 @@ ProcKill(int code, Datum arg)
 	SwitchBackToLocalLatch();
 	pgstat_reset_wait_event_storage();
 
-	proc = MyProc;
 	MyProc = NULL;
 	MyProcNumber = INVALID_PROC_NUMBER;
 	DisownLatch(&proc->procLatch);
@@ -1031,16 +1072,15 @@ ProcKill(int code, Datum arg)
 	proc->vxid.procNumber = INVALID_PROC_NUMBER;
 	proc->vxid.lxid = InvalidTransactionId;
 
-	procgloballist = proc->procgloballist;
 	SpinLockAcquire(&ProcGlobal->freeProcsLock);
-
-	/*
-	 * If we're still a member of a locking group, that means we're a leader
-	 * which has somehow exited before its children.  The last remaining child
-	 * will release our PGPROC.  Otherwise, release it now.
-	 */
-	if (proc->lockGroupLeader == NULL)
+	if (push_leader)
+	{
+		/* Return leader PGPROC (and semaphore) to appropriate freelist */
+		dlist_push_head(leader->procgloballist, &leader->freeProcsLink);
+	}
+	if (push_self)
 	{
+		Assert(proc->lockGroupLeader == NULL);
 		/* Since lockGroupLeader is NULL, lockGroupMembers should be empty. */
 		Assert(dlist_is_empty(&proc->lockGroupMembers));
 
-- 
2.43.0

From a8ecdf34b6c39a07012059eddff503b8259b8423 Mon Sep 17 00:00:00 2001
From: Vlad Lesin <[email protected]>
Date: Thu, 14 May 2026 23:22:01 +0300
Subject: [PATCH 5/5] Fix ProcKill lock-group vs procLatch recycle race

After the preceding refactor, every freelist push happens in a single
ProcGlobal->freeProcsLock critical section near the end of ProcKill().
But SwitchBackToLocalLatch() and DisownLatch(&proc->procLatch) still
ran below that point, so two windows remained in which a PGPROC could
appear on a freelist while its procLatch still had owner_pid != 0:

  * follower-returns-leader: a follower's consolidated push of the
    leader's PGPROC races the leader's own DisownLatch in a different
    process; the lwlock-held decision and the freeProcsLock-held push
    do not order the leader's own DisownLatch.

  * leader-returns-self: a leader that outlives the last follower
    pushes its own PGPROC above the DisownLatch of that same slot.

A newly-forked backend that picks up the recycled slot then calls
OwnLatch() and PANICs with "latch already owned by PID ...".

Hoist SwitchBackToLocalLatch() and DisownLatch(&MyProc->procLatch)
to the very top of ProcKill(), while leaving
pgstat_reset_wait_event_storage() in its existing position after the
lock-group block so wait_event_info remains observable in PGPROC
while a backend may be inspected there (e.g. parked at an injection
point).  The freelist-push consolidation introduced in the preceding
commit gives a single named place where the slot becomes visible to
other backends; this commit ensures every such publication is below
the hoisted DisownLatch.

The invariant is now easy to state: by the time any PGPROC can be
observed on a freelist, its procLatch has already been disowned.
---
 src/backend/storage/lmgr/proc.c | 38 ++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 7201ad9caa8..da3bd3a6436 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -963,6 +963,22 @@ ProcKill(int code, Datum arg)
 	/* Cancel any pending condition variable sleep, too */
 	ConditionVariableCancelSleep();
 
+	/*
+	 * Reset MyLatch to the process local one and disown the shared latch.
+	 * DisownLatch must happen before our PGPROC can appear on a freelist: a
+	 * newly-forked backend that pops our slot and calls OwnLatch() would
+	 * PANIC on a still-owned latch.  The lock-group block below preserves
+	 * that order -- see its comment.
+	 *
+	 * pgstat_reset_wait_event_storage() is intentionally deferred until after
+	 * the lock-group block (and any injection points inside it) so that
+	 * wait_event_info remains visible in our PGPROC slot while we may be
+	 * observed there.  It is safe to defer because our slot is not yet on any
+	 * freelist at this point.
+	 */
+	SwitchBackToLocalLatch();
+	DisownLatch(&MyProc->procLatch);
+
 	proc = MyProc;
 	procgloballist = proc->procgloballist;
 
@@ -980,12 +996,10 @@ ProcKill(int code, Datum arg)
 	 * the source of a double push.)
 	 *
 	 * (b) A follower only sets push_leader when it observes an empty group
-	 * after removing itself, so the leader has already passed its own
-	 * lwlock-held dlist_delete by the time a pusher runs.
-	 *
-	 * NOTE: this commit does NOT establish the latch-ownership invariant that
-	 * the freelist pushes must follow DisownLatch -- see the next commit for
-	 * that.
+	 * after removing itself.  That can only happen if the leader has already
+	 * passed its own lwlock-held dlist_delete, which in turn happened after
+	 * the leader's DisownLatch above.  So by the time any pusher runs, the
+	 * PGPROC it is about to push has already disowned its latch.
 	 */
 	push_leader = false;
 	push_self = true;
@@ -1052,20 +1066,14 @@ ProcKill(int code, Datum arg)
 	INJECTION_POINT("prockill-pre-disown-latch", NULL);
 
 	/*
-	 * Reset MyLatch to the process local one.  This is so that signal
-	 * handlers et al can continue using the latch after the shared latch
-	 * isn't ours anymore.
-	 *
-	 * Similarly, stop reporting wait events to MyProc->wait_event_info.
-	 *
-	 * After that clear MyProc and disown the shared latch.
+	 * Stop reporting wait events to MyProc->wait_event_info now that we are
+	 * done with any injection-point waits.  Do this before zeroing MyProc so
+	 * pgstat internals do not try to dereference it.
 	 */
-	SwitchBackToLocalLatch();
 	pgstat_reset_wait_event_storage();
 
 	MyProc = NULL;
 	MyProcNumber = INVALID_PROC_NUMBER;
-	DisownLatch(&proc->procLatch);
 
 	/* Mark the proc no longer in use */
 	proc->pid = 0;
-- 
2.43.0

Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Reply via email to