On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.ma...@gmail.com> wrote: > > Please find v2. Changes are: > 1) Rebased the patch as there was conflict due to recent commit 6f132ed. > 2) Added an Assert in update_synced_slots_inactive_since() to ensure > that the slot does not have active_pid. > 3) Improved commit msg and comments. >
Few comments: ============== 1. void SyncReplicationSlots(WalReceiverConn *wrconn) { + /* + * Startup process signaled the slot sync to stop, so if meanwhile user + * has invoked slot sync SQL function, simply return. + */ + SpinLockAcquire(&SlotSyncCtx->mutex); + if (SlotSyncCtx->stopSignaled) + { + ereport(LOG, + errmsg("skipping slot synchronization as slot sync shutdown is signaled during promotion")); + + SpinLockRelease(&SlotSyncCtx->mutex); + return; + } + SpinLockRelease(&SlotSyncCtx->mutex); There is a race condition with this code. Say during promotion ShutDownSlotSync() is just before setting this flag and the user has invoked pg_sync_replication_slots() and passed this check but still didn't set the SlotSyncCtx->syncing flag. So, now, the promotion would recognize that there is slot sync going on in parallel, and slot sync wouldn't know that the promotion is in progress. The current coding for slot sync worker ensures there is no such race by checking the stopSignaled and setting the pid together under spinlock. I think we need to move the setting of the syncing flag similarly. Once we do that probably checking SlotSyncCtx->syncing should be sufficient in ShutDownSlotSync(). If we change the location of setting the 'syncing' flag then please ensure its cleanup as we currently do in slotsync_failure_callback(). 2. @@ -1395,6 +1395,7 @@ update_synced_slots_inactive_since(void) if (s->in_use && s->data.synced) { Assert(SlotIsLogical(s)); + Assert(s->active_pid == 0); We can add a comment like: "The slot must not be acquired by any process" -- With Regards, Amit Kapila.