On Tue, 12 Mar 2024 at 09:34, Ajin Cherian <itsa...@gmail.com> wrote: > > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C <vignes...@gmail.com> wrote: >> >> >> Thanks, I have created the following Commitfest entry for this: >> https://commitfest.postgresql.org/47/4816/ >> >> Regards, >> Vignesh > > > Thanks for the patch, I have verified that the fix works well by following > the steps mentioned to reproduce the problem. > Reviewing the patch, it seems good and is well documented. Just one minor > comment I had was probably to change the name of the variable > table_states_valid to table_states_validity. The current name made sense when > it was a bool, but now that it is a tri-state enum, it doesn't fit well.
Thanks for reviewing the patch, the attached v6 patch has the changes for the same. Regards, Vignesh
From ab68e343b4b3f7ec09f758089a9f90a16f21de42 Mon Sep 17 00:00:00 2001 From: Vignesh C <vignes...@gmail.com> Date: Mon, 5 Feb 2024 14:55:48 +0530 Subject: [PATCH v6 2/2] Apply worker will get started after 180 seconds by the launcher in case the apply worker exits immediately after startup. Apply worker was getting started after 180 seconds tiemout of the launcher in case the apply worker exits immediately after startup. This was happening because the launcher process's latch was getting reset after the apply worker was started, which resulted in the launcher to wait for the next 180 seconds timeout before starting the apply worker. Fixed this issue by not resetting the latch, as this latch will be set for subscription modifications and apply worker exit. We should check if the new worker needs to be started or not and reset the latch in ApplyLauncherMain. --- src/backend/replication/logical/launcher.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index 66070e9131..f2545482c8 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -185,7 +185,6 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, BackgroundWorkerHandle *handle) { BgwHandleStatus status; - int rc; for (;;) { @@ -220,16 +219,14 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, /* * We need timeout because we generally don't get notified via latch * about the worker attach. But we don't expect to have to wait long. + * Since this latch is also used for subscription creation/modification + * and apply worker process exit signal handling, the latch must not be + * reset here. We should check if the new worker needs to be started or + * not and reset the latch in ApplyLauncherMain. */ - rc = WaitLatch(MyLatch, - WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, - 10L, WAIT_EVENT_BGWORKER_STARTUP); - - if (rc & WL_LATCH_SET) - { - ResetLatch(MyLatch); - CHECK_FOR_INTERRUPTS(); - } + (void) WaitLatch(MyLatch, + WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + 10L, WAIT_EVENT_BGWORKER_STARTUP); } } -- 2.34.1
From 47c6cd886902693074ece4b2a1188dce6f066bf8 Mon Sep 17 00:00:00 2001 From: Vignesh C <vignes...@gmail.com> Date: Thu, 1 Feb 2024 09:46:40 +0530 Subject: [PATCH v6 1/2] Table sync missed for newly added tables because subscription relation cache invalidation was not handled in certain concurrent scenarios. Table sync was missed if the invalidation of table sync occurs while the non ready tables were getting prepared. This occurrs because the table state was being set to valid at the end of non ready table list preparation irrespective of any inavlidations occurred while preparing the list. Fixed it by changing the boolean variable to an tri-state enum and by setting table state to valid only if no invalidations have been occurred while the list is being prepared. --- src/backend/replication/logical/tablesync.c | 25 +++++++++++++++++---- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index ee06629088..eefe620e5e 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -123,7 +123,14 @@ #include "utils/syscache.h" #include "utils/usercontext.h" -static bool table_states_valid = false; +typedef enum +{ + SYNC_TABLE_STATE_NEEDS_REBUILD, + SYNC_TABLE_STATE_REBUILD_STARTED, + SYNC_TABLE_STATE_VALID, +} SyncingTablesState; + +static SyncingTablesState table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD; static List *table_states_not_ready = NIL; static bool FetchTableStates(bool *started_tx); @@ -273,7 +280,7 @@ wait_for_worker_state_change(char expected_state) void invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue) { - table_states_valid = false; + table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD; } /* @@ -1568,13 +1575,15 @@ FetchTableStates(bool *started_tx) *started_tx = false; - if (!table_states_valid) + if (table_states_validity != SYNC_TABLE_STATE_VALID) { MemoryContext oldctx; List *rstates; ListCell *lc; SubscriptionRelState *rstate; + table_states_validity = SYNC_TABLE_STATE_REBUILD_STARTED; + /* Clean the old lists. */ list_free_deep(table_states_not_ready); table_states_not_ready = NIL; @@ -1608,7 +1617,15 @@ FetchTableStates(bool *started_tx) has_subrels = (table_states_not_ready != NIL) || HasSubscriptionRelations(MySubscription->oid); - table_states_valid = true; + /* + * If the subscription relation cache has been invalidated since we + * entered this routine, we still use and return the relations we just + * finished constructing, to avoid infinite loops, but we leave the + * table states marked as stale so that we'll rebuild it again on next + * access. Otherwise, we mark the table states as valid. + */ + if (table_states_validity == SYNC_TABLE_STATE_REBUILD_STARTED) + table_states_validity = SYNC_TABLE_STATE_VALID; } return has_subrels; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index aa7a25b8f8..bd551fde3f 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2711,6 +2711,7 @@ SupportRequestSelectivity SupportRequestSimplify SupportRequestWFuncMonotonic Syn +SyncingTablesState SyncOps SyncRepConfigData SyncRepStandbyData -- 2.34.1