On Tue, May 13, 2025 at 4:22 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Tue, May 13, 2025 at 3:48 PM shveta malik <shveta.ma...@gmail.com> wrote:
> >
> > Hi All,
> >
> > It is a spin-off thread from earlier discussions at [1] and [2].
> >
> > While analyzing the slot-sync BF failure as stated in [1], it was
> > observed that there are chances that confirmed_flush_lsn may move
> > backward depending on the feedback messages received from the
> > downstream system. It was suspected that the backward movement of
> > confirmed_flush_lsn may result in data duplication issues. Earlier we
> > were able to successfully reproduce the issue with two_phase enabled
> > subscriptions (see[2]). Now on further analysing, it seems possible
> > that data duplication issues may happen without two-phase as well.
>
> Thanks for the detailed explanation. Before we focus on patching the
> symptoms, I’d like to explore whether the issue can be addressed on
> the subscriber side. Specifically, have we analyzed if there’s a way
> to prevent the subscriber from moving the LSN backward in the first
> place? That might lead to a cleaner and more robust solution overall.
>

The subscriber doesn't move the LSN backwards, it only shares the
information with the publisher, which is the latest value of remote
LSN tracked by the origin. Now, as explained in email [1], the
subscriber doesn't persistently store/advance the LSN, for which it
doesn't have to do anything like DDLs, or any other non-published
DMLs. However, subscribers need to send confirmation of such LSNs for
synchronous replication. This is commented in the code as well, see
comments in CreateDecodingContext (It might seem like we should error
out in this case, but it's pretty common for a client to acknowledge a
LSN it doesn't have to do anything for ...). As mentioned in email[1],
persisting the LSN information that the subscriber doesn't have to do
anything with could be a noticeable performance overhead.

I think it is better to deal with this in the publisher by not
allowing it to move confirm_flush LSN backwards, as Shveta proposed.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1%2BzWQwOe5G8zCYGvErnaXh5%2BDbyg_A1Z3uywSf_4%3DT0UA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.


Reply via email to