Peter, I'd like your advice on the following observations, if you'd be so kind:
Using the pg_amcheck command committed yesterday (thanks, Robert! thanks Tom!),
I have begun investigating segfaults that sometimes occur when running the
amcheck routines on intentionally corrupted databases. We've discussed this
before, and there are limits to how much hardening is possible, particularly if
it comes at the expense of backend performance under normal conditions. There
are also serious problems with corruption schemes that differ from what is
likely to go wrong in the wild.
These segfaults present a real usage problem for pg_amcheck. We made the
decision [3] to not continue checking if we get a FATAL or PANIC error.
Getting a segfault in just one database while checking just one index can abort
a pg_amcheck run that spans multiple databases, tables and indexes, and
therefore is not easy to blow off. Perhaps the decision in [3] was wrong, but
if not, some hardening of amcheck might make this problem less common.
The testing strategy I'm using is to corrupt heap and btree pages in schema
"public" of the "regression" database created by `make installcheck`, by
overwriting random bytes in randomly selected locations on pages after the page
header. Then I run `pg_amcheck regression` and see if anything segfaults.
Doing this repeatedly, with random bytes and locations within files not the
same from one run to the next, I can find the locations of segfaults that are
particularly common.
Admittedly, this is not what is likely to be wrong in real-world installations.
I had a patch as part of the pg_amcheck series that I ultimately abandoned for
v14 given the schedule. It reverts tables and indexes (or portions thereof) to
prior states. I'll probably move on to testing with that in a bit.
The first common problem [1] happens at verify_nbtree.c:1422 when a corruption
report is being generated. The generation does not seem entirely safe, and the
problematic bit can be avoided, though I suspect you could do better than the
brute-force solution I'm using locally:
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c4ca633918..fa8b7d5163 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1418,9 +1418,13 @@ bt_target_page_check(BtreeCheckState *state)
OffsetNumberNext(offset));
itup = (IndexTuple) PageGetItem(state->target, itemid);
tid = BTreeTupleGetPointsToTID(itup);
+#if 0
nhtid = psprintf("(%u,%u)",
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
+#else
+ nhtid = "(UNRESOLVED,UNRESOLVED)";
+#endif
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here. I get much
the same crash at verify_nbtree.c:1136, but it's so similar I'm not bother to
include a crash report for it.
The second common problem [2] happens at verify_nbtree.c:2762 where it uses
_bt_compare, which ends up calling pg_detoast_datum_packed on a garbage value.
I'm not sure we can fix that at the level of _bt_compare, since that would have
performance consequences on backends under normal conditions, but perhaps we
could have a function that sanity checks datums, and call that from
invariant_l_offset() prior to _bt_compare? I have observed a variant of this
crash where the text data is not toasted but crashes anyway:
0 libsystem_platform.dylib 0x00007fff738ec929
_platform_memmove$VARIANT$Haswell + 41
1 postgres 0x000000010bf1af34 varstr_cmp + 532
(varlena.c:1678)
2 postgres 0x000000010bf1b6c9 text_cmp + 617
(varlena.c:1770)
3 postgres 0x000000010bf1bfe5 bttextcmp + 69
(varlena.c:1990)
4 postgres 0x000000010bf68c87 FunctionCall2Coll + 167
(fmgr.c:1163)
5 postgres 0x000000010b8a7cb5 _bt_compare + 1445
(nbtsearch.c:721)
6 amcheck.so 0x000000011525eaeb invariant_l_offset + 123
(verify_nbtree.c:2758)
7 amcheck.so 0x000000011525cd92 bt_target_page_check +
4754 (verify_nbtree.c:1398)
[1]
postgres_2021-03-13-084305_laptop280-ma-us.crash
Description: Binary data
[2]
postgres_2021-03-13-094450_laptop280-ma-us.crash
Description: Binary data
[3] https://www.postgresql.org/message-id/CA%2BTgmob2c0eM8%2B5kzkXaqdc9XbBCkHmtihSOSk-jCzzT4BFhFQ%40mail.gmail.com — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
