> On Mar 13, 2021, at 11:06 AM, Peter Geoghegan <p...@bowt.ie> wrote:
>
> On Sat, Mar 13, 2021 at 10:35 AM Mark Dilger
> <mark.dil...@enterprisedb.com> wrote:
>> The testing strategy I'm using is to corrupt heap and btree pages in schema
>> "public" of the "regression" database created by `make installcheck`, by
>> overwriting random bytes in randomly selected locations on pages after the
>> page header. Then I run `pg_amcheck regression` and see if anything
>> segfaults. Doing this repeatedly, with random bytes and locations within
>> files not the same from one run to the next, I can find the locations of
>> segfaults that are particularly common.
>
> That seems like a reasonable starting point to me.
>
>> The first common problem [1] happens at verify_nbtree.c:1422 when a
>> corruption report is being generated. The generation does not seem entirely
>> safe, and the problematic bit can be avoided
>
>> The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here.
>
> I think that the best way to further harden verify_nbtree.c at this
> point is to do the most basic validation of IndexTuples in our own new
> variant of a core bufpage.h macro: PageGetItemCareful(). In other
> words, we should invent the IndexTuple equivalent of the existing
> PageGetItemIdCareful() function (which already does the same thing for
> item pointers), and then replace all current PageGetItem() calls with
> PageGetItemCareful() calls -- we ban raw PageGetItem() calls from
> verify_nbtree.c forever.
>
> Here is some of the stuff that could go in PageGetItemCareful(), just
> off the top of my head:
>
> * The existing "if (tupsize != ItemIdGetLength(itemid))" check at the
> top of bt_target_page_check() -- make sure the length from the
> caller's line pointer agrees with IndexTupleSize().
>
> * Basic validation against the index's tuple descriptor -- in
> particular, that varlena headers are basically sane, and that the
> apparent range of datums is safely within the space on the page for
> the tuple.
>
> * Similarly, BTreeTupleGetHeapTID() should not be able to return a
> pointer that doesn't actually point somewhere inside the space that
> the target page has for the IndexTuple.
>
> * No external TOAST pointers, since this is an index AM, and so
> doesn't allow that.
>
> In general this kind of very basic validation should be pushed down to
> the lowest level code, so that it detects the problem as early as
> possible, before slightly higher level code has the opportunity to
> run. Higher level code is always going to be at risk of making
> assumptions about the data not being corrupt, because there is so much
> more of it, and also because it tends to roughly look like idiomatic
> AM code.
Excellent write-up! Thanks!
I'll work on this and get a patch set going if time allows.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company