On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > On Wed, Apr 6, 2022 at 6:30 PM wangw.f...@fujitsu.com > <wangw.f...@fujitsu.com> wrote: > > > > On Wed, Apr 6, 2022 at 1:58 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > Also, let's try to evaluate how it impacts lag functionality for large > transactions? > > I think this patch will not affect lag functionality. It will updates the > > lag > > field of view pg_stat_replication more frequently. > > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of > > change and invoking time in lag_tracker. Then when invoking function > > ProcessStandbyReplyMessage, it will calculate the lag field according to the > > message from subscriber and the information in lag_tracker. This patch does > > not modify this logic, but only increases the frequency of invoking. > > Please let me know if I understand wrong. > > > > No, your understanding seems correct to me. But what I want to check > is if calling the progress function more often has any impact on > lag-related fields in pg_stat_replication? I think you need to check > the impact of large transaction replay. Thanks for the explanation.
After doing some checks, I found that the v13 patch makes the calculations of lag functionality inaccurate. In short, v13 patch lets us try to track lag more frequently and try to send a keepalive message to subscribers. But in order to prevent flooding the lag tracker, we could not track lag more than once within WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress). This means we may lose informations that needs to be tracked. For example, suppose there is a large transaction with lsn from lsn1 to lsn3. In HEAD, when we calculate the lag time for lsn3, the lag time of lsn3 is (now - lsn3.time). But with v13 patch, when we calculate the lag time for lsn3, because there maybe no informations of lsn3 but has informations of lsn2 in lag_tracker, the lag time of lsn3 is (now - t2.time). (see function LagTrackerRead) Therefore, if we lose the informations that need to be tracked, the lag time becomes large and inaccurate. So I skip tracking lag during a transaction just like the current HEAD. Attach the new patch. Regards, Wang wei
v14-0001-Fix-the-logical-replication-timeout-during-large.patch
Description: v14-0001-Fix-the-logical-replication-timeout-during-large.patch