Hi Heikki, I just reviewed this patch. As offset[x+1] anyway equals offset[x]+nmembers, pre-write offset[x+1] seems a very clever solution.
I got a few questions/comments as below: > On Nov 27, 2025, at 04:59, Heikki Linnakangas <[email protected]> wrote: > > Here's a new version of this. Notable changes: > > - I reverted the changes to ExtendMultiXactOffset(), so that it deals with > wraparound and FirstMultiXactId the same way as before. The caller never > passes FirstMultiXactId, but the changed comments and the assertion were > confusing, so I felt it's best to just leave it alone > > - bunch of comment changes & other cosmetic changes > > - I modified TrimMultiXact() to initialize the page corresponding to > 'nextMulti', because if you just swapped the binary to the new one, and > nextMulti was at a page boundary, it would not be initialized yet. > > If we want to backpatch this, and I think we need to because this fixes real > bugs, we need to think through all the upgrade scenarios. I made the > above-mentioned changes to TrimMultiXact(), but it doesn't fix all the > problems. > > What happens if you replay the WAL generated with old binary, without this > patch, with new binary? It's not good: > > LOG: database system was not properly shut down; automatic recovery in > progress > LOG: redo starts at 0/01766A68 > FATAL: could not access status of transaction 2048 > DETAIL: Could not read from file "pg_multixact/offsets/0000" at offset 8192: > read too few bytes. > CONTEXT: WAL redo at 0/05561030 for MultiXact/CREATE_ID: 2047 offset 4093 > nmembers 2: 2830 (keysh) 2831 (keysh) > LOG: startup process (PID 3130184) exited with exit code 1 > > This is because the WAL, created with old version, contains records like this: > > lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093 nmembers > 2: 2830 (keysh) 2831 (keysh) > lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1 > lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095 nmembers > 2: 2831 (keysh) 2832 (keysh) > > > When replaying that with the new version, replay of the CREATE_ID 2047 record > tries to set the next multixid's offset, but the page hasn't been initialized > yet. With the new version, the ZERO_OFF_PAGE 1 record would appear before the > CREATE_ID 2047 record, but we can't change the WAL that already exists. > > - Heikki > <v11-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch> 1 ``` + if (*offptr != offset) + { + /* should already be set to the correct value, or not at all */ + Assert(*offptr == 0); + *offptr = offset; + MultiXactOffsetCtl->shared->page_dirty[slotno] = true; + } ``` This is a more like a question. Since pre-write should always happen, in theory *offptr != offset should never be true, why do we still need to handle the case instead of just assert(false)? 2 ``` + next = multi + 1; + if (next < FirstMultiXactId) + next = FirstMultiXactId; ``` next < FirstMultiXactId will only be true when next wraps around to 0, maybe deserve one-line comment to explain that. 3 ``` + if (*next_offptr != offset + nmembers) + { + /* should already be set to the correct value, or not at all */ + Assert(*next_offptr == 0); + *next_offptr = offset + nmembers; + MultiXactMemberCtl->shared->page_dirty[slotno] = true; + } ``` Should MultiXactMemberCtl be MultiXactOffsetCtl? As we are writing to the offset SLRU. 4 ``` +# Another multixact test: loosing some multixact must not affect reading near ``` I guess “loosing” should be “losing” 5 ``` +# multitransaction (to avaoid corener case 1). ``` Typo: avoid corner case 6 ``` +# Verify thet recorded multi is readble, this call must not hang. ``` Typo: readable 7 ``` +# One more multitransaction to effectivelt emit WAL record about next ``` Typo: effectivelt -> effectively 8 ``` + plan skip_all => 'Kill9 works unpredicatably on Windows’; ``` Typo: unpredicatably, no “a” after “c" Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
