Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

Bertrand Drouvot Thu, 15 Feb 2024 03:25:16 -0800

Hi,

On Thu, Feb 15, 2024 at 02:09:45PM +0900, Michael Paquier wrote:
> On Thu, Jan 18, 2024 at 02:20:28PM +0000, Bertrand Drouvot wrote:
> > On Thu, Jan 18, 2024 at 04:59:39PM +0530, Bharath Rupireddy wrote:
> >> I'm not sure if it
> >> can happen at all, but I think we can rely on previous conflict reason
> >> instead of previous effective_xmin/effective_catalog_xmin/restart_lsn.
> > 
> > I'm not sure what you mean here. I think we should still keep the "initial" 
> > LSN
> > so that the next check done with it still makes sense. The previous conflict
> > reason as you're proposing also makes sense to me but for another reason: 
> > PANIC
> > in case the issue still happen (for cases we did not think about, means not
> > covered by what the added previous LSNs are covering).
> 
> Using a PANIC seems like an overreaction to me for this path.  Sure, 
> we should not record twice a conflict reason, but wouldn't an
> assertion be more adapted enough to check that a termination is not
> logged twice for this code path run in the checkpointer?


Thanks for looking at it!

Agree, using an assertion instead in v3 attached.

> 
> +            if (!terminated)
> +            {
> +                initial_restart_lsn = s->data.restart_lsn;
> +                initial_effective_xmin = s->effective_xmin;
> +                initial_catalog_effective_xmin = s->effective_catalog_xmin;
> +            }
> 
> It seems important to document why we are saving this data here; while
> we hold the LWLock for repslots, before we perform any termination,
> and we  want to protect conflict reports with incorrect data if the
> slot data got changed while the lwlock is temporarily released while
> there's a termination.

Yeah, comments added in v3.

> >> 3. Is there a way to reproduce this racy issue, may be by adding some
> >> sleeps or something? If yes, can you share it here, just for the
> >> records and verify the whatever fix provided actually works?
> > 
> > Alexander was able to reproduce it on a slow machine and the issue was not 
> > there
> > anymore with v1 in place. I think it's tricky to reproduce as it would need 
> > the
> > slot to advance between the 2 checks.
> 
> I'd really want a test for that in the long term.  And an injection
> point stuck between the point where ReplicationSlotControlLock is
> released then re-acquired when there is an active PID should be
> enough, isn't it? 

Yeah, that should be enough.

> For example just after the kill()?  It seems to me
> that we should stuck the checkpointer, perform a forced upgrade of the
> xmins, release the checkpointer and see the effects of the change in
> the second loop.  Now, modules/injection_points/ does not allow that,
> yet, but I've posted a patch among these lines that can stop a process
> and release it using a condition variable (should be 0006 on [1]).  I
> was planning to start a new thread with a patch posted for the next CF
> to add this kind of facility with a regression test for an old bug,
> the patch needs a few hours of love, first.  I should be able to post
> that next week.

Great, that looks like a good idea!

Are we going to fix this on master and 16 stable first and then later on add a
test on master with the injection points?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

>From eb8bb9eda4c1ea5fe2abe87df66ef2b86cdb2441 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Thu, 11 Jan 2024 13:57:53 +0000
Subject: [PATCH v3] Fix race condition in InvalidatePossiblyObsoleteSlot()

In case of an active slot we proceed in 2 steps:
- terminate the backend holding the slot
- report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means the effective_xmin and effective_catalog_xmin could advance during that time.

Holding the mutex longer is not an option (given what we'd to do while holding it)
so record the previous LSNs instead that were used during the backend termination.
---
 src/backend/replication/slot.c | 39 ++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)
 100.0% src/backend/replication/

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 2180a38063..33f76c4acc 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1454,6 +1454,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 {
 	int			last_signaled_pid = 0;
 	bool		released_lock = false;
+	bool		terminated = false;
+	XLogRecPtr	initial_effective_xmin = InvalidXLogRecPtr;
+	XLogRecPtr	initial_catalog_effective_xmin = InvalidXLogRecPtr;
+	XLogRecPtr	initial_restart_lsn = InvalidXLogRecPtr;
+	ReplicationSlotInvalidationCause conflict_prev PG_USED_FOR_ASSERTS_ONLY = RS_INVAL_NONE;
 
 	for (;;)
 	{
@@ -1488,11 +1493,24 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 		 */
 		if (s->data.invalidated == RS_INVAL_NONE)
 		{
+			/*
+			 * We'll release the slot's mutex soon, so it's possible that
+			 * those values change since the process holding the slot has been
+			 * terminated (if any), so record them here to ensure we would
+			 * report the slot as obsolete correctly.
+			 */
+			if (!terminated)
+			{
+				initial_restart_lsn = s->data.restart_lsn;
+				initial_effective_xmin = s->effective_xmin;
+				initial_catalog_effective_xmin = s->effective_catalog_xmin;
+			}
+
 			switch (cause)
 			{
 				case RS_INVAL_WAL_REMOVED:
-					if (s->data.restart_lsn != InvalidXLogRecPtr &&
-						s->data.restart_lsn < oldestLSN)
+					if (initial_restart_lsn != InvalidXLogRecPtr &&
+						initial_restart_lsn < oldestLSN)
 						conflict = cause;
 					break;
 				case RS_INVAL_HORIZON:
@@ -1501,12 +1519,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 					/* invalid DB oid signals a shared relation */
 					if (dboid != InvalidOid && dboid != s->data.database)
 						break;
-					if (TransactionIdIsValid(s->effective_xmin) &&
-						TransactionIdPrecedesOrEquals(s->effective_xmin,
+					if (TransactionIdIsValid(initial_effective_xmin) &&
+						TransactionIdPrecedesOrEquals(initial_effective_xmin,
 													  snapshotConflictHorizon))
 						conflict = cause;
-					else if (TransactionIdIsValid(s->effective_catalog_xmin) &&
-							 TransactionIdPrecedesOrEquals(s->effective_catalog_xmin,
+					else if (TransactionIdIsValid(initial_catalog_effective_xmin) &&
+							 TransactionIdPrecedesOrEquals(initial_catalog_effective_xmin,
 														   snapshotConflictHorizon))
 						conflict = cause;
 					break;
@@ -1519,6 +1537,13 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 			}
 		}
 
+		/*
+		 * Assert that the conflict cause recorded previously before we
+		 * terminate the process did not change now for any reason.
+		 */
+		Assert(!(conflict_prev != RS_INVAL_NONE && terminated &&
+				 conflict_prev != conflict));
+
 		/* if there's no conflict, we're done */
 		if (conflict == RS_INVAL_NONE)
 		{
@@ -1601,6 +1626,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 					(void) kill(active_pid, SIGTERM);
 
 				last_signaled_pid = active_pid;
+				terminated = true;
+				conflict_prev = conflict;
 			}
 
 			/* Wait until the slot is released. */
-- 
2.34.1

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

Reply via email to