On 19/03/2026 20:11, Heikki Linnakangas wrote:> This fix is not correct. The CREATE_ID records can appear in the WAL > out of order, e.g: > > CREATE_ID:N+2 -> CREATE_ID:N+3 -> CREATE_ID:N+1 > > With the patch, replaying the CREATE_ID:N+1 record would overwrite > the changes made by the CREATE_ID:N+2 and CREATE_ID:N+3 records. > Or perhaps track last-zeroed page separately from > latest_page_number, and if we haven't seen any > XLOG_MULTIXACT_ZERO_OFF_PAGE records yet after startup, call > SimpleLruDoesPhysicalPageExist() to determine if initialization > is needed. Attached patch does that. Thanks for the quick reply and the fix! My two patches were both stuck in the trap of trying to fix latest_page_number itself, which as you pointed out is incorrect. Your approach of stepping back and tracking last_initialized_offsets_page separately, with SimpleLruDoesPhysicalPageExist() as the fallback, is much cleaner. I had considered using SimpleLruDoesPhysicalPageExist() myself but dismissed it as too expensive -- looking at it now, it's probably the necessary path for handling unknown state after restart. I've applied your patch on both REL_17_STABLE and REL_18_STABLE and verified it fixes the crash-restart scenario. The same code pattern exists on REL_14_STABLE through REL_16_STABLE, so it should apply there as well. It would be good to have another pair of eyes review the fix across all affected branches. One minor observation on the patch: the comment for last_initialized_offsets_page currently says: > last_initialized_offsets_page is the XLOG_MULTIXACT_ZERO_OFF_PAGE > record that we saw during WAL replay But the variable is also updated when the compat logic in RecordNewMultiXact() initializes a page itself. So it tracks the last page zeroed during replay from any source, not just from ZERO_OFF_PAGE records. Perhaps something like: /* * last_initialized_offsets_page tracks the last offsets page that * was zeroed during WAL replay, whether by replaying an * XLOG_MULTIXACT_ZERO_OFF_PAGE record or by the compat logic in * RecordNewMultiXact(). -1 if no page has been zeroed yet since * the last restart. */ == Maybe add a TAP test? == The compat logic in RecordNewMultiXact() has now been the source of three bugs in a short period: it was originally introduced as an intermediate step in [0], then broken by TRUNCATE replay [1], and now by the crash-restart scenario described in this thread. The code sits at a tricky intersection of WAL ordering, SLRU page management, and crash recovery -- future changes to SLRU initialization, checkpoint logic, or multixact lifecycle could potentially re-introduce a regression. A targeted regression test would help catch such issues early, especially since the failure mode (standby crash loop) is severe and hard to diagnose in production. The compat logic now has two branches: > if (last_initialized_offsets_page == -1) > init_needed = !SimpleLruDoesPhysicalPageExist(..., next_pageno); > else > init_needed = (last_initialized_offsets_page == pageno); The first branch (fallback path) fires after crash-restart, when last_initialized_offsets_page is -1 because no ZERO_OFF_PAGE has been replayed yet. I wrote a TAP test that exercises this path end-to-end: crash-restart the standby, truncate the offset file to remove the next page, then verify the compat logic detects the missing page via SimpleLruDoesPhysicalPageExist() and initializes it so the standby can continue replay. This ensures the fallback path works as a safety net when the tracking state is unknown. The second branch (fast path) relies on last_initialized_offsets_page advancing monotonically. To guard against future code that might modify this variable elsewhere and break the assumption, we could add Asserts at the two write sites: /* at each write site: */ Assert(last_initialized_offsets_page == -1 || MultiXactOffsetPagePrecedes( last_initialized_offsets_page, new_page)); This ensures new_page is always ahead of last_initialized_offsets_page, even across wraparound. The test reproduces the fallback scenario by creating the crash-restart condition where last_initialized_offsets_page is -1 and the next offsets page is physically missing: 1. Using pg_resetwal -m 2047,1 to place nextMulti at the last entry on offset page 0, so the next allocation crosses a page boundary. 2. Using an injection point to pause after GetNewMultiXactId() but before XLogInsert(CREATE_ID), creating the window where CHECKPOINT captures the advanced nextMulti. 3. Triggering CHECKPOINT during the pause, waiting for the standby to replay it and do a restartpoint. 4. Crashing the standby. 5. Truncating the standby's offset file to remove the next page, simulating the old-version condition where ZERO_OFF_PAGE was never written. This is necessary because on a new-version primary, ZERO_OFF_PAGE is always written before CREATE_ID, so the page would normally exist on disk. 6. Waking up the injection point so CREATE_ID gets written. 7. Restarting the standby and verifying the compat logic initializes the missing page. Without your fix applied, the test correctly detects the bug. I've implemented this in attached patches, one for REL_17_STABLE and one for REL_18_STABLE. I'm still looking for a way to add test coverage on REL_14_STABLE through REL_16_STABLE, where the injection point infrastructure is not available. Would appreciate any comments on the test approach or the patches. Regards Duan
0001-Add-multixact-backwards-compatibility-test-REL_17.patch
Description: Binary data
0002-Add-multixact-backwards-compatibility-test-REL_18.patch
Description: Binary data
