Thank you very much for the detailed response. I will proceed with the native implementation for synchronizing logical replication slots. In a maintenance context, when standby is shutdown, it's possible to temporarily disable the synchronized_standby_slots parameter to avoid blocking logical replication on the primary.
Regards Fabrice On Thu, Jun 5, 2025 at 8:57 AM shveta malik <shveta.ma...@gmail.com> wrote: > On Wed, Jun 4, 2025 at 4:01 PM Fabrice Chapuis <fabrice636...@gmail.com> > wrote: > > > > Hi, > > > > I'm working with logical replication in a PostgreSQL 17 setup, and I'm > exploring the new synchronized_standby_slots parameter to make replication > slots failover safe in a highly available environment using physical > standby nodes managed by Patroni. > > > > While testing this feature, I encountered a blocking behavior, when a > standby is listed in synchronized_standby_slots and that standby goes > offline, logical replication on the primary stops progressing. From what I > understand, the primary node waits for the standby to acknowledge received > wal records, effectively stalling WAL decoding for the logical slot. I > noticed that the failover slot on the standby continue to be synced. > > Yes, your understanding is correct. > > > > > This raises several questions about the tradeoffs and implications of > using this feature: > > > > What are the risks or limitations if synchronized_standby_slots is left > empty (the default)? Is there a risk of data loss or inconsistency for > logical subscribers in such cases? > > If the 'synchronized_standby_slots' setting is left unset, logical > replication subscribers may progress ahead of the physical standby > servers. In the event of a failover under such conditions, the new > primary might lack the necessary data to continue supporting logical > replication, even if synchronized slots are in place, resulting in > unexpected behavior. Therefore, it is strongly recommended to > configure 'synchronized_standby_slots' properly to ensure that all > configured physical standbys have received and flushed the changes > before those changes are made visible to logical replication > subscribers. > > > > Is it expected behavior that any failure of a standby listed in > synchronized_standby_slots stalls logical decoding on the primary? If so, > are there any ways to avoid blocking WAL decoding while still having slot > synchronization? > > Yes, this is expected behavior. It is similar to how > 'synchronous_standby_names' works, where a commit on the primary is > allowed to proceed only after the configured standby servers > acknowledge receipt of the data. The main difference is that > 'synchronous_standby_names' provides more configuration options, such > as FIRST and ANY, allowing the system to wait for a subset of standbys > rather than all of them. However, if none of the configured standbys > are available, the primary will still wait, just like in this case > until a standby becomes available or the configuration is changed. In > the future, if needed, similar flexibility (e.g., support for ANY, > FIRST) could potentially be extended to 'synchronized_standby_slots' > as well. For now, the way to move forward is either by updating the > configuration or by restoring the standby to an operational state. > > > > Patroni is managing FO slots better than native Postgres impletmentation? > > I'm not entirely certain about that. However, PostgreSQL does handle > several complex scenarios, such as: > --Ensuring seamless logical replication on failover by allowing users > to configure potential failover candidates via > synchronized_standby_slots, making synced slots ready for failover in > all the situations. > --To ensure consistency, we avoid direct copy of slot unless a > consistent point could be reached with the new values. Otherwise after > promotion, the slots may not reach a consistent point, potentially > resulting in data loss. > --Supporting two-phase transactions for failover slots, where > transactions prepared before two_phase decoding is enabled are handled > correctly even if the failover occurs immediately afterward. > > You may want to check with the Patroni community for more detailed > insights. We're open to considering any gaps or missing functionality > in PostgreSQL as well. > > thanks > Shveta >