On Tue, May 20, 2025 at 8:35 AM Nisha Moond <nisha.moond...@gmail.com> wrote:
>
> >
> > Thanks for the comments, these are handled in the attached v20250516
> > version patch.
> >
>
> Thanks for the patches. Here are my review comments -
>
> Patch-0004: src/backend/replication/logical/sequencesync.c
>
> The sequence count logic using curr_seq in copy_sequences() seems buggy.
> Currently, curr_seq is incremented based on the number of tuples
> received from the publisher inside the inner while loop.
> This means it's counting the number of sequences returned by the
> publisher, not the number of sequences processed locally. This can
> lead to two issues:
>
> 1) Repeated syncing of sequences:
> If some sequences are missing on the publisher, curr_seq will reflect
> fewer items than expected, and subsequent batches may reprocess
> already-synced sequences. Because next batch will use curr_seq to get
> values from the list as -
>
>   seqinfo = (LogicalRepSequenceInfo *)
> lfirst(list_nth_cell(remotesequences, curr_seq + i));
>
> Example:
> For 110 sequences(s1 to s110), if 5 (s1 to s5) are missing on the
> publisher in the first batch, curr_seq = 95. In the next cycle, we
> resync s95 to s99.
> ~~~~
>
> 2) Risk of sequencesync worker getting stuck in infinite loop
>
> Consider a case where remotesequences has 10 sequences (s1–s10) need
> syncing, and concurrently s9, s10 are deleted on the publisher.
>
> Cycle 1:
> Publisher returns s1–s8. So curr_seq = 8.
>
> Cycle 2:
> Publisher query returns zero rows (as s9, s10 no longer exist).
> curr_seq stays at 8 and never advances.
>
> This causes the while (curr_seq < total_seq) loop to run forever.
> ~~~~
>
> I think curr_seq  should be incremented by batch_seq_count just
> outside the inner while loop.
>

I faced the similar issue while testing. I think it is due to the
code-logic issue pointed out by Nisha above.

Test-scenario:
--Created 250 sequences on both pub and sub.
--There were 10 sequences mismatched.
--Sequence replication worked as expected. Logs look better now:

LOG:  Logical replication sequence synchronization for subscription
"sub1" - total unsynchronized: 250; batch #1 = 100 attempted, 97
succeeded, 3 mismatched
LOG:  Logical replication sequence synchronization for subscription
"sub1" - total unsynchronized: 250; batch #2 = 100 attempted, 95
succeeded, 5 mismatched
LOG:  Logical replication sequence synchronization for subscription
"sub1" - total unsynchronized: 250; batch #3 = 50 attempted, 48
succeeded, 2 mismatched

--Then I corrected a few and deleted 1 on pub, and the sequence sync
worker went into an infinite loop after that.

LOG:  Logical replication sequence synchronization for subscription
"sub1" - total unsynchronized: 10; batch #1004 = 1 attempted, 0
succeeded, 0 mismatched
LOG:  Logical replication sequence synchronization for subscription
"sub1" - total unsynchronized: 10; batch #1005 = 1 attempted, 0
succeeded, 0 mismatched
LOG:  Logical replication sequence synchronization for subscription
"sub1" - total unsynchronized: 10; batch #1006 = 1 attempted, 0
succeeded, 0 mismatched


thanks
Shveta


Reply via email to