On Wed, 13 Mar 2024 at 04:56, Kyotaro Horiguchi <horikyota....@gmail.com> wrote:
>
> At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi 
> <horikyota....@gmail.com> wrote in
> > Oh, I once saw the fix work, but seems not to be working after some
> > point. The new issue was a corruption of received WAL records on the
> > first standby, and it may be related to the setting.
>
> I identified the cause of the second issue. When I tried to replay the
> issue, the second standby accidentally received the old timeline's
> last page-spanning record till the end while the first standby was
> promoting (but it had not been read by recovery). In addition to that,
> on the second standby, there's a time window where the timeline
> increased but the first segment of the new timeline is not available
> yet. In this case, the second standby successfully reads the
> page-spanning record in the old timeline even after the second standby
> noticed that the timeline ID has been increased, thanks to the
> robustness of XLogFileReadAnyTLI().
>
> I think the primary change to XLogPageRead that I suggested is correct
> (assuming the use of wal_segment_size instead of the
> constant). However, still XLogFileReadAnyTLI() has a chance to read
> the segment from the old timeline after the second standby notices a
> timeline switch, leading to the second issue. The second issue was
> fixed by preventing XLogFileReadAnyTLI from reading segments from
> older timelines than those suggested by the latest timeline
> history. (In other words, disabling the "AnyTLI" part).
>
> I recall that there was a discussion for commit 4bd0ad9e44, about the
> objective of allowing reading segments from older timelines than the
> timeline history suggests. In my faint memory, we concluded to
> postpone making the decision to remove the feature due to uncertainity
> about the objective. If there's no clear reason to continue using
> XLogFileReadAnyTLI(), I suggest we stop its use and instead adopt
> XLogFileReadOnTLHistory(), which reads segments that align precisely
> with the timeline history.


This sounds very similar to the problem described in [1]. And I think
both will be resolved by that change.

[1] 
https://postgr.es/m/CANwKhkMN3QwAcvuDZHb6wsvLRtkweBiYso-KLFykkQVWuQLcOw%40mail.gmail.com


Reply via email to