Hi Ivan!

Thanks for the review.

> On 24 Oct 2025, at 19:33, Ivan Bykov <[email protected]> wrote:
> 
> I believe that we should provide more comprehensible feedback to developers 
> by interrupting the deadlock state with a psql timeout. 15 seconds seems safe 
> enough to distinguish between slow node operation and deadlock issue.

Makes sense, but I used $PostgreSQL::Test::Utils::timeout_default seconds.


> On 24 Oct 2025, at 22:16, Bykov Ivan <[email protected]> wrote:
> 
> In GetNewMultiXactId() this code may lead to error
> ---
> ExtendMultiXactOffset(MultiXactState->nextMXact + 1);
> ---
> If MultiXactState->nextMXact = MaxMultiXactId (0xFFFFFFFF)
> we will not extend MultiXactOffset as we should

It will extend SLRU, this calculations are handled in ExtendMultiXactOffset(). 
Moreover, we should eliminate "multi != FirstMultiXactId" bailout condition 
too. I've extended comments and added Assert(). I've added test for this, but 
it's whacky-hacky with dd, resetwal and such. But it uncovered wrond assertion 
in this code:
/* This is an overflow-only branch */
Assert(next_entryno == 0 || next == FirstMultiXactId);
This "next == FirstMultiXactId" was missing.

I also included Kirill's suggestions.

GPT is also worried by initialization of page 0, but I don't fully understand 
it's concerns:
"current MXID’s page is never extended: GetNewMultiXactId() now calls 
ExtendMultiXactOffset(MultiXactState->nextMXact + 1) instead of extending the 
page that will hold result. This skips zeroing the page for the MXID we are 
about to assign, so the first allocation on a fresh cluster (or after wrap) 
tries to write into an uninitialized SLRU page and will fail/crash once the 
buffer manager attempts I/O."

I think we have now good coverage of what happens on fresh cluster and after a 
wraparound.

"wraparound can’t recreate page 0: ExtendMultiXactOffset() now asserts multi != 
FirstMultiXactId, yet after wrap the first ID on page 0 is exactly 
FirstMultiXactId. Because callers no longer pass that value, page 0 is never 
re-created and we keep reusing stale offsets."

Page 0 is actually created via different path on cluster initialization. Though 
everything works fine on wraparound.

Thanks!


Best regards, Andrey Borodin.

Attachment: v10-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch
Description: Binary data

Reply via email to