On Friday, February 2, 2024 2:03 PM Bertrand Drouvot <bertranddrouvot...@gmail.com> wrote: > > Hi, > > On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote: > > Attached v75 patch-set. Changes are: > > > > 1) Re-arranged the patches: > > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are > > separated out in v75-001 as those are independent changes. > > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special > > process' and 'App-name changes' are now merged to single patch which > > makes v75-002. > > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation > > Document' patches are maintained as is (v75-003 and v75-004 now). > > Thanks! > > I only looked at the commit message for v75-0002 and see that it has changed > since the comment done in [1], but it still does not look correct to me. > > " > If a logical slot on the primary is valid but is invalidated on the standby, > then > that slot is dropped and recreated on the standby in next sync-cycle provided > the slot still exists on the primary server. It is okay to recreate such > slots as long > as these are not consumable on the standby (which is the case currently). This > situation may occur due to the following reasons: > - The max_slot_wal_keep_size on the standby is insufficient to retain WAL > records from the restart_lsn of the slot. > - primary_slot_name is temporarily reset to null and the physical slot is > removed. > - The primary changes wal_level to a level lower than logical. > " > > If a logical decoding slot "still exists on the primary server" then the > primary > can not change the wal_level to lower than logical, one would get something > like: > > "FATAL: logical replication slot "logical_slot" exists, but wal_level < > logical" > > and then slots won't get invalidated on the standby. I've the feeling that the > wal_level conflict part may need to be explained separately? (I think it's not > possible that they end up being re-created on the standby for this conflict, > they will be simply removed as it would mean the counterpart one on the > primary does not exist anymore).
This is possible in some extreme cases, because the slot is synced asynchronously. For example: If on the primary the wal_level is changed to 'replica' and then changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE wals. And before the standby replay these wals, user can create a failover slot on the primary because the wal_level is logical, and if the slotsync worker has synced the slots before startup process replay the XLOG_PARAMETER_CHANGE, then when replaying the XLOG_PARAMETER_CHANGE, the just synced slot will be invalidated. Although I think it doesn't seem a real world case, so I am not sure is it worth separate explanation. Best Regards, Hou zj