On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> > > wrote: > > > > > > > > > Attach a new version patch which fixed an un-initialized variable > > > issue and added some comments. > > > > > > > The other approach to fix this issue could be that the slotsync worker > > get the serialized snapshot using pg_read_binary_file() corresponding > > to restart_lsn and writes those at standby. But there are cases when > > we won't have such a file like (a) when we initially create the slot > > and reach the consistent_point, or (b) also by the time the slotsync > > worker starts to read the remote snapshot file, the snapshot file > > could have been removed by the checkpointer on the primary (if the > > restart_lsn of the remote has been advanced in this window). So, in > > such cases, we anyway need to advance the slot. I think these could be > optimizations that we could do in the future. > > > > Few comments: > > Thanks for the comments. > > > ============= > > 1. > > - if (slot->data.database != MyDatabaseId) > > + if (slot->data.database != MyDatabaseId && !fast_forward) > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("replication slot \"%s\" was not created in this database", > > @@ -526,7 > > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn, > > * Do not allow consumption of a "synchronized" slot until the standby > > * gets promoted. > > */ > > - if (RecoveryInProgress() && slot->data.synced) > > + if (RecoveryInProgress() && slot->data.synced && > > + !IsSyncingReplicationSlots()) > > > > > > Add comments at both of the above places. > > Added. > > > > > > > 2. > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > moveto, > > + bool *found_consistent_point); > > + > > > > This API looks a bit awkward as the functionality doesn't match the > > name. How about having a function with name > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > ready_for_decoding) with the same functionality as your patch has for > > pg_logical_replication_slot_advance() and then invoke it both from > > pg_logical_replication_slot_advance and slotsync.c. The function name > > is too big, we can think of a shorter name. Any ideas? > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested > LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be > renamed in next version if we agree). > > Attach the V3 patch which addressed above comments and Kuroda-san's > comments[1]. I also adjusted the tap-test to only check the > confirmed_flush_lsn > after syncing, as the restart_lsn could be different from the remote one due > to > the new slot_advance() call. I am also testing some optimization idea locally > and > will share if ready.
Attach the V4 patch which includes the optimization to skip the decoding if the snapshot at the syncing restart_lsn is already serialized. It can avoid most of the duplicate decoding in my test, and I am doing some more tests locally. Best Regards, Hou zj
v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Description: v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch