On Wed, Jun 11, 2025 at 7:19 AM shveta malik <shveta.ma...@gmail.com> wrote: > > On Tue, Jun 10, 2025 at 3:20 PM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > > > > > > Thanks for updating the patch. > > > > I have few suggestions for the document from a user's perspective. > > > > Thanks Hou-San, I agree with your suggestions. Addressed in v4. > > Also addressed Amit's suggestion at [1] to improve errdetail. >
So, the overall direction we are taking here is that we want to improve the existing LOG/DEBUG messages and docs for HEAD and back branches. Then we will improve the API behavior based on Hou-San's patch for PG19. Let me know if you or others think otherwise. + <para> + Apart from enabling <link linkend="guc-sync-replication-slots"> + <varname>sync_replication_slots</varname></link> to synchronize slots + periodically, failover slots can be manually synchronized by invoking + <link linkend="pg-sync-replication-slots"> + <function>pg_sync_replication_slots</function></link> on the standby. + However, this function is primarily intended for testing and debugging + purposes and should be used with caution. The recommended approach to + synchronize slots is by enabling <link linkend="guc-sync-replication-slots"> + <varname>sync_replication_slots</varname></link> on the standby, as it + ensures continuous and automatic synchronization of replication slots, + facilitating seamless failover and high availability. + </para> + + <para> + When slot-synchronization setup is done as recommended, and + slot-synchronization is performed the very first time either automatically + or by <link linkend="pg-sync-replication-slots"> + <function>pg_sync_replication_slots</function></link>, + then for the synchronized slot to be created and persisted on the standby, + one condition must be met. The logical replication slot on the primary + must reach a state where the WALs and system catalog rows retained by + the slot are also present on the corresponding standby server. This is + needed to prevent any data loss and to allow logical replication to continue + seamlessly through the synchronized slot if needed after promotion. + If the WALs and system catalog rows retained by the slot on the primary have + already been purged from the standby server, and synchronization is attempted + for the first time, then to prevent the data loss as explained, persistence + and synchronization of newly created slot will be skipped, and the following + log message may appear on standby. +<programlisting> + LOG: could not synchronize replication slot "failover_slot" + DETAIL: Synchronization could lead to data loss as the remote slot needs WAL at LSN 0/3003F28 and catalog xmin 754, but the standby has LSN 0/3003F28 and catalog xmin 756 +</programlisting> + If the logical replication slot is actively consumed by a consumer, no further + manual action is needed by the user, as the slot on primary will be advanced + automatically, and synchronization will proceed in the next cycle. However, + if no logical replication consumer is set up yet, to advance the slot, it + is recommended to manually run the <link linkend="pg-logical-slot-get-changes"> + <function>pg_logical_slot_get_changes</function></link> or + <link linkend="pg-logical-slot-get-binary-changes"> + <function>pg_logical_slot_get_binary_changes</function></link> on the primary + slot and allow synchronization to proceed. + </para> + I have reworded the above as follows: To enable periodic synchronization of replication slots, it is recommended to activate sync_replication_slots on the standby server. While manual synchronization is possible using pg_sync_replication_slots, this function is primarily intended for testing and debugging and should be used with caution. Automatic synchronization via sync_replication_slots ensures continuous slot updates, supporting seamless failover and maintaining high availability. When slot synchronization is configured as recommended, and the initial synchronization is performed either automatically or manually via pg_sync_replication_slot, the standby can persist the synchronized slot only if the following condition is met: The logical replication slot on the primary must retain WALs and system catalog rows that are still available on the standby. This ensures data integrity and allows logical replication to continue smoothly after promotion. If the required WALs or catalog rows have already been purged from the standby, the slot will not be persisted to avoid data loss. In such cases, the following log message may appear: LOG: could not synchronize replication slot "failover_slot" DETAIL: Synchronization could lead to data loss as the remote slot needs WAL at LSN 0/3003F28 and catalog xmin 754, but the standby has LSN 0/3003F28 and catalog xmin 756 If the logical replication slot is actively used by a consumer, no manual intervention is needed; the slot will advance automatically, and synchronization will resume in the next cycle. However, if no consumer is configured, it is advisable to manually advance the slot on the primary using pg_logical_slot_get_changes or pg_logical_slot_get_binary_changes, allowing synchronization to proceed. Let me know what you think of above? -- With Regards, Amit Kapila.