Hi Since the multixact equivalent of this problem[1] fell through the cracks on the multixact mega-thread, here is an updated patch that addresses this problem for both pg_subtrans and pg_multixact/offsets using the same approach: always step back one multixact/xid (rather than doing so only if oldest == next, which seemed like an unnecessary complication, and a bit futile since the result of such a test is only an instantaneous snapshot). I've added this to the commitfest[2]. I am also attaching a new set of repro scripts including a pair to test the case where next multixact/xid == first valid ID (the scripts with 'wraparound' in the name, which use dirty pg_resetxlog tricks to get into that situation). In my previous patch I naively subtracted one, which didn't work for those (even rarer!) cases. The new patch steps over the special ID values.
This is a low priority bug: it just produces low probability bogus (but perhaps alarming) LOG messages and skips truncation during checkpoints on low activity systems. There have been occasional reports of these pg_subtrans messages going back as far as 2007 (and Alvaro was barking up the correct tree[3] back in 2010), so I figured it was worth following up. I also took a look at the pg_clog and pg_commit_ts truncation functions. You could argue that they have the same problem in theory (they pass a page number derived from the oldest xid to SimpleLruTruncate, and maybe there is a way for that to be an xid that hasn't been issued yet), but in practice I don't think it's a reachable condition. They use the frozen xid that is updated by vacuuming, but vacuum itself advances the next xid counter in the process. Is there a path though the vacuum code that ever exposes frozen xid == next xid? In contrast, for pg_subtrans we use GetOldestXmin(), which is equal to the next xid if there are no running transactions, and for pg_multixact we use the oldest multixact, which can be equal to the next multixact ID after a wraparound vacuum because vacuum itself doesn't always consume multixacts. [1] http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23oq44bbnwvn3g6+dxx+s5g4jrbp-vy8g...@mail.gmail.com [2] https://commitfest.postgresql.org/5/265/ [3] http://www.postgresql.org/message-id/1274373980-sup-3...@alvh.no-ip.org -- Thomas Munro http://www.enterprisedb.com
repro-bogus-multixact-error.sh
Description: Bourne shell script
repro-bogus-subtrans-error.sh
Description: Bourne shell script
repro-bogus-multixact-error-wraparound.sh
Description: Bourne shell script
repro-bogus-subtrans-error-wraparound.sh
Description: Bourne shell script
fix-bogus-truncation-errors.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers