On Mon, Sep 8, 2025 at 11:21 PM shveta malik <shveta.ma...@gmail.com> wrote: > > Hi, > > This is a spin-off thread from [1]. > > Currently, in the slot-sync worker, we have an error scenario [2] > where, during slot synchronization, if we detect a slot with the same > name and its synced flag is set to false, we emit an error. The > rationale is to avoid potentially overwriting a user-created slot. > > But while analyzing [1], we observed that this error can lead to > inconsistent behavior during switchovers. On the first switchover, the > new standby logs an error: "Exiting from slot synchronization because > a slot with the same name already exists on the standby." But during > a double switchover, this error does not occur. > > Upon re-evaluating this, it seems more appropriate to clear the synced > flag after promotion, as the flag does not hold any meaning on the > primary. Doing so would ensure consistent behavior across all > switchovers, as the same error will be raised avoiding the risk of > overwriting user's slots.
There is the following comment in FinishWalRecovery(): /* * Shutdown the slot sync worker to drop any temporary slots acquired by * it and to prevent it from keep trying to fetch the failover slots. * * We do not update the 'synced' column in 'pg_replication_slots' system * view from true to false here, as any failed update could leave 'synced' * column false for some slots. This could cause issues during slot sync * after restarting the server as a standby. While updating the 'synced' * column after switching to the new timeline is an option, it does not * simplify the handling for the 'synced' column. Therefore, we retain the * 'synced' column as true after promotion as it may provide useful * information about the slot origin. */ ShutDownSlotSync(); Does the patch address the above concerns? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com