At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi <horikyota....@gmail.com> wrote in > Oh, I once saw the fix work, but seems not to be working after some > point. The new issue was a corruption of received WAL records on the > first standby, and it may be related to the setting.
I identified the cause of the second issue. When I tried to replay the issue, the second standby accidentally received the old timeline's last page-spanning record till the end while the first standby was promoting (but it had not been read by recovery). In addition to that, on the second standby, there's a time window where the timeline increased but the first segment of the new timeline is not available yet. In this case, the second standby successfully reads the page-spanning record in the old timeline even after the second standby noticed that the timeline ID has been increased, thanks to the robustness of XLogFileReadAnyTLI(). I think the primary change to XLogPageRead that I suggested is correct (assuming the use of wal_segment_size instead of the constant). However, still XLogFileReadAnyTLI() has a chance to read the segment from the old timeline after the second standby notices a timeline switch, leading to the second issue. The second issue was fixed by preventing XLogFileReadAnyTLI from reading segments from older timelines than those suggested by the latest timeline history. (In other words, disabling the "AnyTLI" part). I recall that there was a discussion for commit 4bd0ad9e44, about the objective of allowing reading segments from older timelines than the timeline history suggests. In my faint memory, we concluded to postpone making the decision to remove the feature due to uncertainity about the objective. If there's no clear reason to continue using XLogFileReadAnyTLI(), I suggest we stop its use and instead adopt XLogFileReadOnTLHistory(), which reads segments that align precisely with the timeline history. Of course, regardless of the changes above, if recovery on the second standby had reached the end of the page-spanning record before redirection to the first standby, it would need pg_rewind to connect to the first standby. regards. -- Kyotaro Horiguchi NTT Open Source Software Center