On Tue, May 20, 2025 at 8:35 AM Nisha Moond <nisha.moond...@gmail.com> wrote: > > > > > Thanks for the comments, these are handled in the attached v20250516 > > version patch. > > > > Thanks for the patches. Here are my review comments - > > Patch-0004: src/backend/replication/logical/sequencesync.c > > The sequence count logic using curr_seq in copy_sequences() seems buggy. > Currently, curr_seq is incremented based on the number of tuples > received from the publisher inside the inner while loop. > This means it's counting the number of sequences returned by the > publisher, not the number of sequences processed locally. This can > lead to two issues: > > 1) Repeated syncing of sequences: > If some sequences are missing on the publisher, curr_seq will reflect > fewer items than expected, and subsequent batches may reprocess > already-synced sequences. Because next batch will use curr_seq to get > values from the list as - > > seqinfo = (LogicalRepSequenceInfo *) > lfirst(list_nth_cell(remotesequences, curr_seq + i)); > > Example: > For 110 sequences(s1 to s110), if 5 (s1 to s5) are missing on the > publisher in the first batch, curr_seq = 95. In the next cycle, we > resync s95 to s99. > ~~~~ > > 2) Risk of sequencesync worker getting stuck in infinite loop > > Consider a case where remotesequences has 10 sequences (s1–s10) need > syncing, and concurrently s9, s10 are deleted on the publisher. > > Cycle 1: > Publisher returns s1–s8. So curr_seq = 8. > > Cycle 2: > Publisher query returns zero rows (as s9, s10 no longer exist). > curr_seq stays at 8 and never advances. > > This causes the while (curr_seq < total_seq) loop to run forever. > ~~~~ > > I think curr_seq should be incremented by batch_seq_count just > outside the inner while loop. >
I faced the similar issue while testing. I think it is due to the code-logic issue pointed out by Nisha above. Test-scenario: --Created 250 sequences on both pub and sub. --There were 10 sequences mismatched. --Sequence replication worked as expected. Logs look better now: LOG: Logical replication sequence synchronization for subscription "sub1" - total unsynchronized: 250; batch #1 = 100 attempted, 97 succeeded, 3 mismatched LOG: Logical replication sequence synchronization for subscription "sub1" - total unsynchronized: 250; batch #2 = 100 attempted, 95 succeeded, 5 mismatched LOG: Logical replication sequence synchronization for subscription "sub1" - total unsynchronized: 250; batch #3 = 50 attempted, 48 succeeded, 2 mismatched --Then I corrected a few and deleted 1 on pub, and the sequence sync worker went into an infinite loop after that. LOG: Logical replication sequence synchronization for subscription "sub1" - total unsynchronized: 10; batch #1004 = 1 attempted, 0 succeeded, 0 mismatched LOG: Logical replication sequence synchronization for subscription "sub1" - total unsynchronized: 10; batch #1005 = 1 attempted, 0 succeeded, 0 mismatched LOG: Logical replication sequence synchronization for subscription "sub1" - total unsynchronized: 10; batch #1006 = 1 attempted, 0 succeeded, 0 mismatched thanks Shveta