Hi,

Thank you for the response.

On Tue, 17 Mar 2026 at 03:40, Heikki Linnakangas <[email protected]> wrote:

>
> Replaying the record will perform the same sanity checks against
> wraparound as the primary does.
>
> Hmm, although why did I not apply commit 817f74600d to 'master', only
> backbranches? The bug that it fixed was related to minor version
> upgrade, and thus it was not needed on 'master', but the code change
> would nevertheless make a lot of sense on 'master' too.
>

Agreed, once 817f74600d is on master the standby would honestly evaluate
the SimpleLruTruncate wraparound backstop instead of bypassing it.

However, the backstop is documented as catching "wraparound bugs elsewhere
in SLRU handling." If such a bug corrupts latest_page_number on the
primary, the standby — which derives its latest_page_number independently
from ZERO_OFF_PAGE replay and StartupMultiXact() — would not share the same
corruption. The primary would skip the truncation, but the standby would
see a healthy latest_page_number and proceed.


> Have you been able to reproduce that?
>

I have reproduced the primary-side condition on an unmodified tree using
gdb in batch mode: attach to the VACUUM backend after
WriteMTruncateXlogRec() returns, corrupt latest_page_number, and resume.
The primary logs "apparent wraparound" and skips the physical deletion,
while pg_waldump confirms the TRUNCATE_ID record is present in the WAL. I
have not yet set up a streaming replica to demonstrate end-to-end
divergence and promotion failure.

>
> I agree that would probably be better. I'm not sure how straightforward
> it will be to implement though, I wouldn't want to add much extra code
> just for this.
>

One approach that might keep the footprint small: we could inline the same
PagePrecedes check that SimpleLruTruncate uses directly in
TruncateMultiXact(), before START_CRIT_SECTION(). Something like:

if (MultiXactOffsetCtl->PagePrecedes(
        pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number),
        MultiXactIdToOffsetPage(PreviousMultiXactId(newOldestMulti))) ||
    MultiXactMemberCtl->PagePrecedes(
        pg_atomic_read_u64(&MultiXactMemberCtl->shared->latest_page_number),
        MXOffsetToMemberPage(newOldestOffset)))
{
    ereport(LOG,
            (errmsg("skipping multixact truncation due to apparent
wraparound")));
    LWLockRelease(MultiXactTruncationLock);
    return;
}

No new functions, no changes to slru.c or the replay path — just the same
condition evaluated earlier so we never enter the critical section or write
WAL for a truncation that won't be carried out. Does this seem like a
reasonable direction?

Regards,
Ayush

Reply via email to