On Fri, May 30, 2025 at 3:38 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote: > > > > On Fri, May 23, 2025 at 10:07 PM Amit Kapila <amit.kapil...@gmail.com> > > wrote: > > > > > > In the case presented here, the logical slot is expected to keep > > > forwarding, and in the consecutive sync cycle, the sync should be > > > successful. Users using logical decoding APIs should also be aware > > > that if due for some reason, the logical slot is not moving forward, > > > the master/publisher node will start accumulating dead rows and WAL, > > > which can create bigger problems. > > > > I've tried this case and am concerned that the slot synchronization using > > pg_sync_replication_slots() would never succeed while the primary keeps > > getting write transactions. Even if the user manually consumes changes on > > the > > primary, the primary server keeps advancing its XID in the meanwhile. On the > > standby, we ensure that the > > TransamVariables->nextXid is beyond the XID of WAL record that it's > > going to apply so the xmin horizon calculated by > > GetOldestSafeDecodingTransactionId() ends up always being higher than the > > slot's catalog_xmin on the primary. We get the log message "could not > > synchronize replication slot "s" because remote slot precedes local slot" > > and > > cleanup the slot on the standby at the end of pg_sync_replication_slots(). > > To improve this workload scenario, we can modify pg_sync_replication_slots() > to > wait for the primary slot to advance to a suitable position before completing > synchronization and removing the temporary slot. This would allow the sync to > complete as soon as the primary slot advances, whether through > pg_logical_xx_get_changes() or other ways. > > I've created a POC (attached) that currently waits indefinitely for the remote > slot to catch up. We could later add a timeout parameter to control maximum > wait time if this approach seems acceptable. > > I tested that, when pgbench TPC-B is running on the primary, calling > pg_sync_replication_slots() on the standby correctly blocks until I advance > the > primary slot position by calling pg_logical_xx_get_changes(). > > if the basic idea sounds reasonable then I can start a separate > thread to extend this API. Thoughts ?
IMHO, this idea has merit, have you started a thread for reviewing this patch? -- Regards, Dilip Kumar Google