Hi, On 2022-02-19 20:57:57 -0800, Noah Misch wrote: > On Wed, Feb 16, 2022 at 03:43:12PM +0700, John Naylor wrote: > > On Wed, Feb 16, 2022 at 6:17 AM Peter Geoghegan <p...@bowt.ie> wrote: > > > On Tue, Feb 15, 2022 at 9:28 AM Peter Geoghegan <p...@bowt.ie> wrote: > > > > > > I did notice from my own testing of the failsafe (by artificially > > > > inducing wraparound failure using an XID burning C function) that > > > > autovacuum seemed to totally correct the problem, even when the system > > > > had already crossed xidStopLimit - it came back on its own. I wasn't > > > > completely sure of how robust this effect was, though. > > > > I'll put some effort in finding any way that it might not be robust. > > A VACUUM may create a not-trivially-bounded number of multixacts via > FreezeMultiXactId(). In a cluster at multiStopLimit, completing VACUUM > without error needs preparation something like: > > 1. Kill each XID that might appear in a multixact. > 2. Resolve each prepared transaction that might appear in a multixact. > 3. Run VACUUM. At this point, multiStopLimit is blocking new multixacts from > other commands, and the lack of running multixact members removes the need > for FreezeMultiXactId() to create multixacts. > > Adding to the badness of single-user mode so well described upthread, one can > enter it without doing (2) and then wrap the nextMXact counter.
If we collected the information along the lines of I proposed in the second half of https://www.postgresql.org/message-id/20220204013539.qdegpqzvayq3d4y2%40alap3.anarazel.de we should be able to handle such cases more intelligently, I think? We could e.g. add an error if FreezeMultiXactId() needs to create a new multixact for a far-in-the-past xid. That's not great, of course, but if we include the precise cause (pid of backend / prepared xact name / slot name / ...) necessitating creating a new multi, it'd still be a significant improvement over the status quo. Greetings, Andres Freund