On Wed, 24 Apr 2024 at 11:59, Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 9:19 AM vignesh C <vignes...@gmail.com> wrote:
> >
> > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian <itsa...@gmail.com> wrote:
> > >
> > >
> > >
> > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C <vignes...@gmail.com> wrote:
> > >>
> > >>
> > >> Thanks, I have created the following Commitfest entry for this:
> > >> https://commitfest.postgresql.org/47/4816/
> > >>
> > >> Regards,
> > >> Vignesh
> > >
> > >
> > > Thanks for the patch, I have verified that the fix works well by 
> > > following the steps mentioned to reproduce the problem.
> > > Reviewing the patch, it seems good and is well documented. Just one minor 
> > > comment I had was probably to change the name of the variable 
> > > table_states_valid to table_states_validity. The current name made sense 
> > > when it was a bool, but now that it is a tri-state enum, it doesn't fit 
> > > well.
> >
> > Thanks for reviewing the patch, the attached v6 patch has the changes
> > for the same.
> >
>
> v6_0001* looks good to me. This should be backpatched unless you or
> others think otherwise.

This patch needs to be committed in master,PG16 and PG15.
This is not required from PG14 onwards because they don't have
HasSubscriptionRelations call before updating table_states_valid:
    /*
     * Does the subscription have tables?
     *
     * If there were not-READY relations found then we know it does. But
     * if table_states_not_ready was empty we still need to check again to
     * see if there are 0 tables.
     */
    has_subrels = (table_states_not_ready != NIL) ||
      HasSubscriptionRelations(MySubscription->oid);

So the invalidation function will not be called here.

Whereas for PG15 and newer versions this is applicable:
HasSubscriptionRelations calls table_open function which will get the
invalidate callback like in:
HasSubscriptionRelations -> table_open -> relation_open ->
LockRelationOid -> AcceptInvalidationMessages ->
ReceiveSharedInvalidMessages -> LocalExecuteInvalidationMessage ->
CallSyscacheCallbacks -> invalidate_syncing_table_states

The attached patch
v7-0001-Table-sync-missed-due-to-race-condition-in-subscr.patch
applies for master and PG16 branch,
v7-0001-Table-sync-missed-due-to-race-condition-in-subscr_PG15.patch
applies for PG15 branch.

Regards,
Vignesh
From 93116320bd9ffbcddc83a1fc253cba90c56cb47e Mon Sep 17 00:00:00 2001
From: Vignesh C <vignes...@gmail.com>
Date: Thu, 1 Feb 2024 09:46:40 +0530
Subject: [PATCH v7] Table sync missed due to race condition in subscription
 cache handling.

Table sync was missed if the invalidation of table sync occurs while
the non ready tables were getting prepared. This occurrs because
the table state was being set to valid at the end of non ready table
list preparation irrespective of any inavlidations occurred while
preparing the list. Fixed it by changing the boolean variable to an
tri-state enum and by setting table state to valid only if no
invalidations have been occurred while the list is being prepared.
---
 src/backend/replication/logical/tablesync.c | 25 +++++++++++++++++----
 src/tools/pgindent/typedefs.list            |  1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 81fff1919d..7d4ee48206 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -122,7 +122,14 @@
 #include "utils/syscache.h"
 #include "utils/usercontext.h"
 
-static bool table_states_valid = false;
+typedef enum
+{
+	SYNC_TABLE_STATE_NEEDS_REBUILD,
+	SYNC_TABLE_STATE_REBUILD_STARTED,
+	SYNC_TABLE_STATE_VALID,
+} SyncingTablesState;
+
+static SyncingTablesState table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 static List *table_states_not_ready = NIL;
 static bool FetchTableStates(bool *started_tx);
 
@@ -272,7 +279,7 @@ wait_for_worker_state_change(char expected_state)
 void
 invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 {
-	table_states_valid = false;
+	table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 }
 
 /*
@@ -1556,13 +1563,15 @@ FetchTableStates(bool *started_tx)
 
 	*started_tx = false;
 
-	if (!table_states_valid)
+	if (table_states_validity != SYNC_TABLE_STATE_VALID)
 	{
 		MemoryContext oldctx;
 		List	   *rstates;
 		ListCell   *lc;
 		SubscriptionRelState *rstate;
 
+		table_states_validity = SYNC_TABLE_STATE_REBUILD_STARTED;
+
 		/* Clean the old lists. */
 		list_free_deep(table_states_not_ready);
 		table_states_not_ready = NIL;
@@ -1596,7 +1605,15 @@ FetchTableStates(bool *started_tx)
 		has_subrels = (table_states_not_ready != NIL) ||
 			HasSubscriptionRelations(MySubscription->oid);
 
-		table_states_valid = true;
+		/*
+		 * If the subscription relation cache has been invalidated since we
+		 * entered this routine, we still use and return the relations we just
+		 * finished constructing, to avoid infinite loops, but we leave the
+		 * table states marked as stale so that we'll rebuild it again on next
+		 * access. Otherwise, we mark the table states as valid.
+		 */
+		if (table_states_validity == SYNC_TABLE_STATE_REBUILD_STARTED)
+			table_states_validity = SYNC_TABLE_STATE_VALID;
 	}
 
 	return has_subrels;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4ec490e362..5a9353c493 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2676,6 +2676,7 @@ SupportRequestSelectivity
 SupportRequestSimplify
 SupportRequestWFuncMonotonic
 Syn
+SyncingTablesState
 SyncOps
 SyncRepConfigData
 SyncRepStandbyData
-- 
2.34.1

From 461cfc99ac1d85d22629ec4880febb307460e4d5 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignes...@gmail.com>
Date: Wed, 24 Apr 2024 11:27:17 +0530
Subject: [PATCH v7] Table sync missed due to race condition in subscription
 cache handling.

Table sync was missed if the invalidation of table sync occurs while
the non ready tables were getting prepared. This occurrs because
the table state was being set to valid at the end of non ready table
list preparation irrespective of any inavlidations occurred while
preparing the list. Fixed it by changing the boolean variable to an
tri-state enum and by setting table state to valid only if no
invalidations have been occurred while the list is being prepared.
---
 src/backend/replication/logical/tablesync.c | 25 +++++++++++++++++----
 src/tools/pgindent/typedefs.list            |  1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c3dd902f8c..e6159acba0 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -120,7 +120,14 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 
-static bool table_states_valid = false;
+typedef enum
+{
+	SYNC_TABLE_STATE_NEEDS_REBUILD,
+	SYNC_TABLE_STATE_REBUILD_STARTED,
+	SYNC_TABLE_STATE_VALID,
+} SyncingTablesState;
+
+static SyncingTablesState table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 static List *table_states_not_ready = NIL;
 static bool FetchTableStates(bool *started_tx);
 
@@ -270,7 +277,7 @@ wait_for_worker_state_change(char expected_state)
 void
 invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 {
-	table_states_valid = false;
+	table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 }
 
 /*
@@ -1465,13 +1472,15 @@ FetchTableStates(bool *started_tx)
 
 	*started_tx = false;
 
-	if (!table_states_valid)
+	if (table_states_validity != SYNC_TABLE_STATE_VALID)
 	{
 		MemoryContext oldctx;
 		List	   *rstates;
 		ListCell   *lc;
 		SubscriptionRelState *rstate;
 
+		table_states_validity = SYNC_TABLE_STATE_REBUILD_STARTED;
+
 		/* Clean the old lists. */
 		list_free_deep(table_states_not_ready);
 		table_states_not_ready = NIL;
@@ -1505,7 +1514,15 @@ FetchTableStates(bool *started_tx)
 		has_subrels = (list_length(table_states_not_ready) > 0) ||
 			HasSubscriptionRelations(MySubscription->oid);
 
-		table_states_valid = true;
+		/*
+		 * If the subscription relation cache has been invalidated since we
+		 * entered this routine, we still use and return the relations we just
+		 * finished constructing, to avoid infinite loops, but we leave the
+		 * table states marked as stale so that we'll rebuild it again on next
+		 * access. Otherwise, we mark the table states as valid.
+		 */
+		if (table_states_validity == SYNC_TABLE_STATE_REBUILD_STARTED)
+			table_states_validity = SYNC_TABLE_STATE_VALID;
 	}
 
 	return has_subrels;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d030c485fd..78f3ec0caf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2635,6 +2635,7 @@ SupportRequestSelectivity
 SupportRequestSimplify
 SupportRequestWFuncMonotonic
 Syn
+SyncingTablesState
 SyncOps
 SyncRepConfigData
 SyncRepStandbyData
-- 
2.34.1

Reply via email to