Hi!

I'm working on providing smooth failover to a CDC system in HA cluster.
Currently, we do not replicate logical slots and when we promote a replica. 
This renders impossible continuation of changed data capture (CDC) from new 
primary after failover.

We cannot start logical replication from LSN different from LSN of a slot. And 
cannot create a slot on LSN in the past, particularly before or right after 
promotion.

This leads to massive waste of network bandwidth in our installations, due to 
necessity of initial table sync.

We are considering to use the extension that creates replication slot with LSN 
in the past [0]. I understand that there might be some caveats with logical 
replication, but do not see scale of possible implications of this approach. 
User get error if WAL is rotated or waits if LSN is not reached yet, this seems 
perfectly fine for us. In most of our cases when CDC agent detects failover and 
goes to new primary there are plenty of old WALs to restart CDC.

Are there strong reasons why we do not allow creation of slots with given LSNs, 
possibly within narrow LSN range (but wider that just GetXLogInsertRecPtr())?

Thanks!

Best regards, Andrey Borodin.


[0] https://github.com/x4m/pg_tm_aux/blob/master/pg_tm_aux.c#L74-L77



Reply via email to