Hi hackers,

Attached patch adds a new "indexallkeysmatch" option to bt_index_check()
and bt_index_parent_check() that verifies each index tuple points to a
heap tuple with the same key - the reverse of "heapallindexed".

I need the tool to investigate corruption, possibly inflicted by us
ourselves. But the tool might be useful for the community too.
We hit B-tree corruptions where index entries stored different keys than
their heap tuples (e.g. "foobar" in index vs "foo-bar" in heap).
This happened with UTF-8 Russian locales around hyphens/spaces. The
index structure stayed valid so existing checks didn't catch it.

The implementation uses a Bloom filter to avoid excessive random heap
I/O. A sequential heap scan fingerprints visible (key, tid) pairs
first. During the index traversal, each leaf tuple is probed against
the filter; only when the filter says "missing" do we fetch the heap
tuple and compare keys. Posting list entries are expanded and checked
individually.

When both heapallindexed and indexallkeysmatch are enabled, the heap
is scanned twice. Combining them into one pass would complicate the
code and possibly introduce some errors.

There's also a TAP test that detects corruption via expression function swap.
Someone might consider not using bug (corrupting indexes by changing expression)
in tests, but it's already used, so I reused this bug too.

WDYT? Would you like to see it on CF, or do we have enough amcheck patches
there already and it's better to postpone it to v20?


Best regards, Andrey Borodin.

Attachment: v1-0001-amcheck-add-indexallkeysmatch-verification-for-B-.patch
Description: Binary data

Reply via email to