While debugging the B-tree bug that Jeff Janes reported
a new issue came up:
If you reach the xidStopLimit, and try to run VACUUM, it fails with error:
ERROR: database is not accepting commands to avoid wraparound data loss
in database "jjanes"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.
The backtrace looks like this:
#0 errstart (elevel=20, filename=0x9590a0 "varsup.c", lineno=120,
funcname=0x9593f0 <__func__.10455> "GetNewTransactionId",
domain=0x0) at elog.c:249
#1 0x00000000004f7c14 in GetNewTransactionId (isSubXact=0 '\000') at
#2 0x00000000004f86db in AssignTransactionId (s=0xd62900
#3 0x00000000004f84a4 in GetCurrentTransactionId () at xact.c:382
#4 0x000000000062dc1c in vac_truncate_clog (frozenXID=1493663893,
#5 0x000000000062dc06 in vac_update_datfrozenxid () at vacuum.c:888
#6 0x000000000062cdf6 in vacuum (vacstmt=0x29e05e0, relid=0, do_toast=1
bstrategy=0x2a5cc38, for_wraparound=0 '\000', isTopLevel=1 '\001')
#7 0x00000000007a3c55 in standard_ProcessUtility (parsetree=0x29e05e0,
queryString=0x29dfbf8 "vacuum ;", context=PROCESS_UTILITY_TOPLEVEL,
dest=0x29e0988, completionTag=0x7fff9411a490 "") at utility.c:645
So, vac_truncate_clog() tries to get a new transaction ID, which fails
because we've already reached the stop-limit. vac_truncate_clog()
doesn't really need a new XID to be assigned, though, it only uses it to
compare against datfrozenxid to see if wrap-around has already happened,
so it could use ReadNewTransactionId() instead.
Jeff's database seems to have wrapped around already, because after
fixing the above, I get this:
WARNING: some databases have not been vacuumed in over 2 billion
DETAIL: You might have already suffered transaction-wraparound data loss.
We do not truncate clog when wraparound has already happened, so we
never get past that point. Jeff advanced XID counter aggressively with
some custom C code, so hitting the actual wrap-around is a case of
"don't do that". Still, the case is quite peculiar: pg_controldata says
that nextXid is 4/1593661139. The oldest datfrozenxid is equal to that,
1593661139. So ISTM he managed to not just wrap around, but execute 2
billion more transactions after the wraparound and reach datfrozenxid
again. I'm not sure how that happened.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: