Re: Improve pg_sync_replication_slots() to wait for primary to advance

Amit Kapila Mon, 16 Feb 2026 20:16:12 -0800

On Tue, Feb 17, 2026 at 9:13 AM shveta malik <[email protected]> wrote:
>
> On Mon, Feb 16, 2026 at 4:35 PM Amit Kapila <[email protected]> wrote:
> >
> > On Fri, Feb 13, 2026 at 7:54 AM Zhijie Hou (Fujitsu)
> > <[email protected]> wrote:
> > >
> > > Thanks for pushing! Here are the remaining patches.
> > >
> >
> > One thing that bothers me about the remaining patch is that it could
> > lead to infinite re-tires in the worst case. For example, in first
> > try, slot-1 is not synced say due to physical replication delays in
> > flushing WALs up to the confirmed_flush_lsn of that slot, then in next
> > (re-)try, the same thing happened for slot-2, then in next (re-)try,
> > slot-3 appears to invalidated on standby but it is valid on primary,
> > and so on. What do you think?
>
> Yes, that is a possibility we cannot rule out. This can also happen
> during the first invocation of the API (even without the new changes)
> when we attempt to create new slots, they may remain in a temporary
> state indefinitely. However, that risk is limited to the initial sync,
> until the slots are persisted, which is somewhat expected behavior.
>


Right.

> With the current changes though, the possibility of an indefinite wait
> exists during every run. So the question becomes: what would be more
> desirable for users -- for the API to finish with the risk that a few
> slots are not synced, or for the API to wait longer to ensure that all
> slots are properly synced?
>
> I think that if the primary use case of this API is when a user plans
> to run it before a scheduled failover, then it would be better for the
> API to wait and ensure everything is properly synced.
>

I don't think we can guarantee that all slots are synced as per latest
primary state in one invocation because some newly created slots can
anyway be missed. So why take the risk of infinite waits in the API? I
think it may be better to extend the usage of this API (probably with
more parameters) based on more user feedback.

> But I am not
> very very sure on the use case though. What do you think?
>
> > Independent of whether we consider the entire patch, the following bit
> > in the patch in useful as we retry to sync the slots via API.
> > @@ -218,7 +219,7 @@ update_local_synced_slot(RemoteSlot *remote_slot,
> > Oid remote_dbid)
> >   * Can get here only if GUC 'synchronized_standby_slots' on the
> >   * primary server was not configured correctly.
> >   */
> > - ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
> > + ereport(LOG,
> >   errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("skipping slot synchronization because the received slot sync"
> >      " LSN %X/%08X for slot \"%s\" is ahead of the standby position 
> > %X/%08X",
> >
>
> yes. I agree.
>

Let's wait for Hou-San's opinion on this one.

-- 
With Regards,
Amit Kapila.

Re: Improve pg_sync_replication_slots() to wait for primary to advance

Reply via email to