While debugging the B-tree bug that Jeff Janes reported (http://www.postgresql.org/message-id/CAMkU=1y=vwf07ay+cpqk_7fpihrctmssv9y99sbghitkxpb...@mail.gmail.com), a new issue came up:

If you reach the xidStopLimit, and try to run VACUUM, it fails with error:

jjanes=# vacuum;
ERROR: database is not accepting commands to avoid wraparound data loss in database "jjanes"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

The backtrace looks like this:

#0  errstart (elevel=20, filename=0x9590a0 "varsup.c", lineno=120,
funcname=0x9593f0 <__func__.10455> "GetNewTransactionId", domain=0x0) at elog.c:249 #1 0x00000000004f7c14 in GetNewTransactionId (isSubXact=0 '\000') at varsup.c:115 #2 0x00000000004f86db in AssignTransactionId (s=0xd62900 <TopTransactionStateData>)
    at xact.c:510
#3  0x00000000004f84a4 in GetCurrentTransactionId () at xact.c:382
#4 0x000000000062dc1c in vac_truncate_clog (frozenXID=1493663893, minMulti=1)
    at vacuum.c:909
#5  0x000000000062dc06 in vac_update_datfrozenxid () at vacuum.c:888
#6 0x000000000062cdf6 in vacuum (vacstmt=0x29e05e0, relid=0, do_toast=1 '\001', bstrategy=0x2a5cc38, for_wraparound=0 '\000', isTopLevel=1 '\001') at vacuum.c:294
#7  0x00000000007a3c55 in standard_ProcessUtility (parsetree=0x29e05e0,
queryString=0x29dfbf8 "vacuum ;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    dest=0x29e0988, completionTag=0x7fff9411a490 "") at utility.c:645

So, vac_truncate_clog() tries to get a new transaction ID, which fails because we've already reached the stop-limit. vac_truncate_clog() doesn't really need a new XID to be assigned, though, it only uses it to compare against datfrozenxid to see if wrap-around has already happened, so it could use ReadNewTransactionId() instead.


Jeff's database seems to have wrapped around already, because after fixing the above, I get this:

jjanes=# vacuum;
WARNING: some databases have not been vacuumed in over 2 billion transactions
DETAIL:  You might have already suffered transaction-wraparound data loss.
VACUUM

We do not truncate clog when wraparound has already happened, so we never get past that point. Jeff advanced XID counter aggressively with some custom C code, so hitting the actual wrap-around is a case of "don't do that". Still, the case is quite peculiar: pg_controldata says that nextXid is 4/1593661139. The oldest datfrozenxid is equal to that, 1593661139. So ISTM he managed to not just wrap around, but execute 2 billion more transactions after the wraparound and reach datfrozenxid again. I'm not sure how that happened.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to