room for improvement in amcheck btree checking?

Mark Dilger Tue, 01 Dec 2020 12:39:24 -0800

Peter,

While working on the pg_amcheck code for [1], I discovered an unexpected 
deficiency in the way btree indexes are checked.  So far as I can tell from the 
docs [2], the deficiency does not violate any promises that amcheck is making, 
but I found it rather surprising all the same.  To reproduce:

1) Create a (possibly empty) table and btree index over the table.
2) Flush buffers and backup a copy of the heap relation file.
3) Load (more) data into the table.
4) Flushing buffers as needed, revert the heap relation file to the backup
previously taken.
5) Run bt_index_check and bt_index_parent_check with and without heapallindexed
and/or rootdescend. Note that the index passes all checks.
6) Run a SQL query that uses a sequential scan on the table and observe no
errors.
7) Run a SQL query that uses an index scan on the table and see that it errors
with something like:

ERROR: could not read block 0 in file "base/13097/16391": read only 0 of
8192 bytes

I found it surprising that even when precisely zero of the tids in the index
exist in the table the index checks all come back clean. The heapallindexed
check is technically running as advertised, checking that all of the zero
tuples in the heap are present in the index. That is a pretty useless check
under this condition, though. Is a "indexallheaped" option (by some less crazy
name) needed?

Users might also run into this problem when a heap relation file gets
erroneously shortened by some number of blocks but not fully truncated, or
perhaps with torn page writes.

Have you already considered and rejected a "indexallheaped" type check?

Background
-------

I have up until recently been focused on corruption caused by twiddling the
bits within heap and index relation pages, but real-world user error, file
system error, and perhaps race conditions in the core postgresql code seem at
least as likely to result in missing or incorrect versions of blocks of
relation files rather than individual bytes within those blocks being wrong.
Per our discussions in [3], not all corruptions that can be created under
laboratory conditions are equally likely to occur in the wild, and it may be
reasonable to only harden the amcheck code against corruptions that are more
likely to happen in actual practice.

To make it easier for tap tests to cover common corruption type scenarios, I
have been extending PostgresNode.pm with functions to perform these kinds of
file system corruptions. I expect to post that work in another thread soon. I
am not embedding any knowledge of the internal structure of heap, index, or
toast relations in PostgresNode, only creating functions to archive versions of
files and perform full or partial reversions of them later.

The ultimate goal of this work is to have sufficient regression tests to
demonstrate that pg_amcheck can be run with default options against a system
corrupted in these common ways without crashing, and with reasonable likelihood
of detecting these common corruptions. Users might understand that hard to
detect corruption will go unnoticed, but it would be harder to explain to them
why, immediately after getting a clean bill of health on their system, a query
bombed out with the sort of error shown above.

[1] https://commitfest.postgresql.org/31/2670/
[2] https://www.postgresql.org/docs/13/amcheck.html
[3]
https://www.postgresql.org/message-id/flat/CAH2-WznaU6HcahLV4Hg-DnhEmW8DuSdYfn3vfWXoj3Me9jq%3DsQ%40mail.gmail.com#0691475da5e9163d21b13fc415095801

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

room for improvement in amcheck btree checking?

Reply via email to