Hi hackers,
We recently ran into an issue where the visibility map of a relation was
corrupt, running Postgres 12.4. The error we'd get when running a SELECT * from
this table is:
could not access status of transaction 3704450152
DETAIL: Could not open file "pg_xact/0DCC": No such file or directory.
On the lists I could find several similar reports, but corruption like this
could obviously have a very wide range of root causes.. see [1] [2] [3] for
example - not all of them have their root cause known.
This particular case was similar to reported cases above, but also has some
differences.
The following query returns ~21.000 rows, which indicates something
inconsistent between the visibility map and the pd_all_visible flag on the page:
select * from pg_check_frozen('tbl');
Looking at one of the affected pages with pageinspect:
=# SELECT
lp,lp_off,lp_flags,lp_len,t_xmin,t_xmax,t_field3,t_ctid,t_infomask2,t_infomask,t_hoff,t_oid
FROM heap_page_items(get_raw_page('tbl', 726127));
┌────┬────────┬──────────┬────────┬────────────┬────────────┬──────────┬────────────┬─────────────┬────────────┬────────┬───────┐
│ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │
t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_oid │
├────┼────────┼──────────┼────────┼────────────┼────────────┼──────────┼────────────┼─────────────┼────────────┼────────┼───────┤
│ 1 │ 6328 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │
(726127,1) │ 249 │ 8339 │ 56 │ ∅ │
│ 2 │ 4464 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │
(726127,2) │ 249 │ 8339 │ 56 │ ∅ │
│ 3 │ 2600 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │
(726127,3) │ 249 │ 8339 │ 56 │ ∅ │
│ 4 │ 680 │ 1 │ 1920 │ 3704450155 │ 3704450155 │ 1 │
(726127,4) │ 249 │ 8339 │ 56 │ ∅ │
└────┴────────┴──────────┴────────┴────────────┴────────────┴──────────┴────────────┴─────────────┴────────────┴────────┴───────┘
t_infomask shows that HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID bits are both
unset.
This pg_visibility() call shows the inconsistency between VM and page, with
PD_ALL_VISIBLE=false
=# select * from pg_visibility('tbl', 726127);
┌─────────────┬────────────┬────────────────┐
│ all_visible │ all_frozen │ pd_all_visible │
├─────────────┼────────────┼────────────────┤
│ t │ t │ f │
└─────────────┴────────────┴────────────────┘
Looking at other pages show the same information.
What's interesting is that out of the affected tuples returned by
pg_check_frozen, over 99% belong to 1 transaction (3704450155 as seen above)
and the remaining few are from one other transaction that occurred at roughly
the same time.
I find it hard to believe that this is due to some random bit flipping, because
many pages are affected, but the "incorrect" ones are in total only from two
specific transactions which occurred at roughly the same time. There were also
no server crashes or other known failures around the time of this transaction.
I'm not ruling out any other kind of failure still, but I also cannot really
explain how this could have happened.
The server has PG12.4 with full_page_writes=on, data_checksums=off. It's a
large analytics database. The VM inconsistencies also occur on the streaming
replicas.
I realize these cases are pretty rare and hard to debug, but I wanted to share
the information I found so far here for reference. Maybe someone has an idea
what occurred, or maybe someone in the future finds it useful when he finds
something similar.
I have no idea how the inconsistency between VM and PD_ALL_VISIBLE started -
from looking through the code I can't really find any way how this could occur.
However, for it to lead to the problem described here, I believe there should
be *no* SELECT that touches that particular page after the insert/update
transaction and before the transaction log gets truncated. If the page is read
before the transaction log gets truncated, then the hint bit
HEAP_XMIN_COMMITTED will get set and future reads will succeed regardless of tx
log truncation. One of the replica's had this happen to it: the affected pages
are identical to the primary except that the HEAP_XMIN_COMMITTED flag is set
(note that the VM inconsistency is still there on the replica though:
PD_ALL_VISIBLE=false even though VM shows that all_frozen=all_visible=true).
But I can query these rows on the replica without issues, because it doesn't
check the tx log when it sees that HEAP_XMIN_COMMITTED is set.
-Floris
[1] https://postgrespro.com/list/thread-id/2422376
[2] https://postgrespro.com/list/thread-id/2501800
[3] https://postgrespro.com/list/thread-id/2321949