On Fri, Apr 12, 2024 at 7:57 AM shveta malik <shveta.ma...@gmail.com> wrote: > > On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.ma...@gmail.com> wrote: > > > > > > Please find v2. Changes are: > > > > Thanks for the patch. Here are some comments. > > Thanks for reviewing. > > > > 1. Can we have a clear saying in the shmem variable who's syncing at > > the moment? Is it a slot sync worker or a backend via SQL function? > > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;" > > > > typedef enum SlotSyncSource > > { > > SLOT_SYNC_NONE, > > SLOT_SYNC_WORKER, > > SLOT_SYNC_BACKEND, > > } SlotSyncSource; > > > > Then, the check in ShutDownSlotSync can be: > > > > + /* > > + * Return if neither the slot sync worker is running nor the > > function > > + * pg_sync_replication_slots() is executing. > > + */ > > + if ((SlotSyncCtx->pid == InvalidPid) && > > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND) > > { > >
I don't know if this will be help, especially after fixing the race condition I mentioned. But otherwise, also, at this stage it doesn't seem helpful to add the source of sync explicitly. > > 2. > > SyncReplicationSlots(WalReceiverConn *wrconn) > > { > > + /* > > + * Startup process signaled the slot sync to stop, so if meanwhile user > > + * has invoked slot sync SQL function, simply return. > > + */ > > + SpinLockAcquire(&SlotSyncCtx->mutex); > > + if (SlotSyncCtx->stopSignaled) > > + { > > + ereport(LOG, > > + errmsg("skipping slot synchronization as slot sync > > shutdown is signaled during promotion")); > > + > > > > Unless I'm missing something, I think this can't detect if the backend > > via SQL function is already half-way through syncing in > > synchronize_one_slot. So, better move this check to (or also have it > > there) slot sync loop that calls synchronize_one_slot. To avoid > > spinlock acquisitions, we can perhaps do this check in when we acquire > > the spinlock for synced flag. > > If the sync via SQL function is already half-way, then promotion > should wait for it to finish. I don't think it is a good idea to move > the check to synchronize_one_slot(). The sync-call should either not > start (if it noticed the promotion) or finish the sync and then let > promotion proceed. But I would like to know others' opinion on this. > Agreed. -- With Regards, Amit Kapila.