On Tue, 12 Mar 2024 at 09:34, Ajin Cherian <itsa...@gmail.com> wrote:
>
>
>
> On Tue, Mar 12, 2024 at 2:59 PM vignesh C <vignes...@gmail.com> wrote:
>>
>>
>> Thanks, I have created the following Commitfest entry for this:
>> https://commitfest.postgresql.org/47/4816/
>>
>> Regards,
>> Vignesh
>
>
> Thanks for the patch, I have verified that the fix works well by following 
> the steps mentioned to reproduce the problem.
> Reviewing the patch, it seems good and is well documented. Just one minor 
> comment I had was probably to change the name of the variable 
> table_states_valid to table_states_validity. The current name made sense when 
> it was a bool, but now that it is a tri-state enum, it doesn't fit well.

Thanks for reviewing the patch, the attached v6 patch has the changes
for the same.

Regards,
Vignesh
From ab68e343b4b3f7ec09f758089a9f90a16f21de42 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignes...@gmail.com>
Date: Mon, 5 Feb 2024 14:55:48 +0530
Subject: [PATCH v6 2/2] Apply worker will get started after 180 seconds by the
 launcher in case the apply worker exits immediately after startup.

Apply worker was getting started after 180 seconds tiemout of the
launcher in case the apply worker exits immediately after startup. This
was happening because the launcher process's latch was getting reset
after the apply worker was started, which resulted in the launcher to
wait for the next 180 seconds timeout before starting the apply worker.
Fixed this issue by not resetting the latch, as this latch will be set for
subscription modifications and apply worker exit. We should check if the
new worker needs to be started or not and reset the latch in ApplyLauncherMain.
---
 src/backend/replication/logical/launcher.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 66070e9131..f2545482c8 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -185,7 +185,6 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
 							   BackgroundWorkerHandle *handle)
 {
 	BgwHandleStatus status;
-	int			rc;
 
 	for (;;)
 	{
@@ -220,16 +219,14 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
 		/*
 		 * We need timeout because we generally don't get notified via latch
 		 * about the worker attach.  But we don't expect to have to wait long.
+		 * Since this latch is also used for subscription creation/modification
+		 * and apply worker process exit signal handling, the latch must not be
+		 * reset here. We should check if the new worker needs to be started or
+		 * not and reset the latch in ApplyLauncherMain.
 		 */
-		rc = WaitLatch(MyLatch,
-					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-					   10L, WAIT_EVENT_BGWORKER_STARTUP);
-
-		if (rc & WL_LATCH_SET)
-		{
-			ResetLatch(MyLatch);
-			CHECK_FOR_INTERRUPTS();
-		}
+		(void) WaitLatch(MyLatch,
+						 WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 10L, WAIT_EVENT_BGWORKER_STARTUP);
 	}
 }
 
-- 
2.34.1

From 47c6cd886902693074ece4b2a1188dce6f066bf8 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignes...@gmail.com>
Date: Thu, 1 Feb 2024 09:46:40 +0530
Subject: [PATCH v6 1/2] Table sync missed for newly added tables because
 subscription relation cache invalidation was not handled in certain
 concurrent scenarios.

Table sync was missed if the invalidation of table sync occurs while
the non ready tables were getting prepared. This occurrs because
the table state was being set to valid at the end of non ready table
list preparation irrespective of any inavlidations occurred while
preparing the list. Fixed it by changing the boolean variable to an
tri-state enum and by setting table state to valid only if no
invalidations have been occurred while the list is being prepared.
---
 src/backend/replication/logical/tablesync.c | 25 +++++++++++++++++----
 src/tools/pgindent/typedefs.list            |  1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index ee06629088..eefe620e5e 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -123,7 +123,14 @@
 #include "utils/syscache.h"
 #include "utils/usercontext.h"
 
-static bool table_states_valid = false;
+typedef enum
+{
+	SYNC_TABLE_STATE_NEEDS_REBUILD,
+	SYNC_TABLE_STATE_REBUILD_STARTED,
+	SYNC_TABLE_STATE_VALID,
+} SyncingTablesState;
+
+static SyncingTablesState table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 static List *table_states_not_ready = NIL;
 static bool FetchTableStates(bool *started_tx);
 
@@ -273,7 +280,7 @@ wait_for_worker_state_change(char expected_state)
 void
 invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 {
-	table_states_valid = false;
+	table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 }
 
 /*
@@ -1568,13 +1575,15 @@ FetchTableStates(bool *started_tx)
 
 	*started_tx = false;
 
-	if (!table_states_valid)
+	if (table_states_validity != SYNC_TABLE_STATE_VALID)
 	{
 		MemoryContext oldctx;
 		List	   *rstates;
 		ListCell   *lc;
 		SubscriptionRelState *rstate;
 
+		table_states_validity = SYNC_TABLE_STATE_REBUILD_STARTED;
+
 		/* Clean the old lists. */
 		list_free_deep(table_states_not_ready);
 		table_states_not_ready = NIL;
@@ -1608,7 +1617,15 @@ FetchTableStates(bool *started_tx)
 		has_subrels = (table_states_not_ready != NIL) ||
 			HasSubscriptionRelations(MySubscription->oid);
 
-		table_states_valid = true;
+		/*
+		 * If the subscription relation cache has been invalidated since we
+		 * entered this routine, we still use and return the relations we just
+		 * finished constructing, to avoid infinite loops, but we leave the
+		 * table states marked as stale so that we'll rebuild it again on next
+		 * access. Otherwise, we mark the table states as valid.
+		 */
+		if (table_states_validity == SYNC_TABLE_STATE_REBUILD_STARTED)
+			table_states_validity = SYNC_TABLE_STATE_VALID;
 	}
 
 	return has_subrels;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa7a25b8f8..bd551fde3f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2711,6 +2711,7 @@ SupportRequestSelectivity
 SupportRequestSimplify
 SupportRequestWFuncMonotonic
 Syn
+SyncingTablesState
 SyncOps
 SyncRepConfigData
 SyncRepStandbyData
-- 
2.34.1

Reply via email to