Hi, On Wed, Oct 22, 2025 at 9:57 PM Kirill Reshke <[email protected]> wrote: > > Hi! Thank you for your review.
Thank you for the new version! > Im posting new version of 0001 patch of series > > On Tue, 22 Jul 2025 at 15:47, Arseniy Mukhin > <[email protected]> wrote: > > > > Hi, Andrey! > > > > Thank you for working on this! There is a long history of the patch, I > > hope it will be committed soon!) > > > > On Fri, Jul 11, 2025 at 3:39 PM Andrey Borodin <[email protected]> wrote: > > > > > > > > > > > > > On 30 Jun 2025, at 16:34, Andrey Borodin <[email protected]> wrote: > > > > > > > > Please find attached two new steps for amcheck: > > > > 1. A function to verify GiST integrity. This patch is in decent shape, > > > > simply rebased from previous year. > > > > 2. Support on pg_amcheck's side for this function. This patch did not > > > > receive such review attention before. And, perhaps, should be extended > > > > to support existing GIN functions. > > > > > > Here's a version that adds GIN functions to pg_amcheck. > > > IDK, maybe we should split pg_amcheck part into another thread and add > > > there BRIN too... > > > > > > > Speaking of BRIN pg_amcheck, I probably wouldn't merge it with the > > gist/gin pg_amceck patchset because that would create a dependency on > > the amcheck BRIN support patch, which is not clear when it will be > > ready. > > > > > > There are some points about the patch: > > > > 1) There are several typos in verify_gist.c: > > > > downlinks -> downlink (header comment) > > discrepencies -> discrepancies > > Correctess -> Correctness > > hande -> handle > > Initaliaze -> Initialize > > numbmer -> number > > replcaed -> replaced > > aquire -> aqcuire > > > > 2) Copyright year is 2023 in the patch. Time flies:) > > These two are (trivially) fixed. > > > 3) There is the same check in btree and while reviewing the patch I > > realised it should be added to the BRIN amcheck as well. Probably it > > will be needed for GIN someday. What do you think about moving it to > > verify_common? > > > > if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin && > > > > !TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data), > > result->snapshot->xmin)) > > I think this is a good idea. I'm not sure if we should bother with > refactoring in this series though... Great, so maybe we can start a separate thread for this small refactoring. Some work in this direction has been already done in brin amcheck thread [0]. > > > 4) Should we check blknum of the new entry before pushing to the > > stack? Probably we can check if it's a valid blknum and it's not > > outside of the index. This way we can give a more detailed error > > message in case we meet the wrong blknum. > > > > in the split detection code: > > > > ptr->blkno = GistPageGetOpaque(page)->rightlink; > > > > and when we add children of an inner page: > > > > ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid)); > > We indeed need to recheck all bytes in possibly-corrupted indexes, > including downlinks. > But amcheck can be run concurrently with Index insert, which will > change the current index size, so checking is not trivial. > And given it is not checked in already-committed nbtree & GIN amcheck > modules, I suggest not bother with that in this thread. > This can be a separate patch to verify_nbtree. > OK. > > > 6) Several points about 'gistFormNormalizedTuple'. I read the previous > > thread [1] and it seems there is an unfinished discussion about > > normalizing during heapallindexed check. I think normalization needs > > some more work here. > > > > 6a) There is a TODO > > /* pfree(DatumGetPointer(old)); // TODO: this fails. Why? */ > > > > 6b) AFAICS 'compress' is an optional support function. If opclass > > doesn't have a 'compress' function, then 'gistCompressValues' leaves > > such attributes as it is. Here we get attdata from the heap scan, and > > it could be toasted. That means that these checks can result in false > > positives: > > > > gistCompressValues(giststate, r, attdata, isnull, true, compatt); > > ... > > > > for (int i = 0; i < r->rd_att->natts; i++) > > { > > if (VARATT_IS_EXTERNAL(DatumGetPointer(compatt[i]))) > > ereport(ERROR, > > (errcode(ERRCODE_INDEX_CORRUPTED), > > .... > > if (VARATT_IS_COMPRESSED(DatumGetPointer(compatt[i]))) > > { > > if (i < IndexRelationGetNumberOfKeyAttributes(r)) > > ereport(ERROR, > > > > Also 'VARATT_IS_EXTERNAL' check will always result in a false > > positive for toasted include attributes here. Reproducer for it: > > > > DROP TABLE IF EXISTS tbl; > > CREATE TABLE tbl(a point, t text); > > -- disable compression for 't', but let it to be external > > ALTER TABLE tbl ALTER COLUMN t SET STORAGE external ; > > INSERT INTO tbl values (point(random(), random()), repeat('a',3000 )); > > CREATE INDEX tbl_idx ON tbl using gist (a) include (t); > > SELECT gist_index_check('tbl_idx', true); > > > > So I think we need to remove these checks completely. > > > > 6c) Current code doesn't apply normalization for existing index tuples > > during adding to bloom filter, which can result in false positive, > > reproducer: > > Here we use plain storage during index build, then during check we > > have extended storage, which results in different binary > > representation of the same data and we have false positive here. > > > > DROP TABLE IF EXISTS tbl; > > CREATE TABLE tbl(a tsvector); > > CREATE INDEX tbl_idx ON tbl using gist (a) ; > > ALTER TABLE tbl ALTER COLUMN a SET STORAGE plain; > > INSERT INTO tbl values ('a' ::tsvector); > > ALTER TABLE tbl ALTER COLUMN a SET STORAGE extended ; > > SELECT gist_index_check('tbl_idx', true); > > > > 6d) In the end of 'gistFormNormalizedTuple' we have > > > > ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff); > > > > I guess it follows gistFormTuple function, but here we use > > gistFormNormalizedTuple only for leaf tuples and we override > > offsetnumber right after 'gistFormNormalizedTuple' function call, so > > looks like we can drop it. > > > > > > In general I think normalization here can follow the same logic as > > for verify_nbtree. We can even reuse 'bt_normalize_tuple' as a > > normalization function. It handles all corner cases like short varatt, > > differences in compressions etc, that we can have in gist as well. It > > contains just a few lines about btree and everything else valid for > > gist, so we need to modify it a bit. I think we can move it to > > verify_common. Then we need to normalize every existing leaf index > > tuple before adding it to the bloom filter. During the probing phase I > > think we just can use 'gistFormTuple' to build an index tuple and then > > normalize it before probing. What do you think? > > > > Thank you! > > I did a little refactor in patch one, so we can reuse > bt_normalize_tuple. With these changes, your reproducers do not > complain. > I guess `gistFormNormalizedTuple` is now unneeded and its comment no > longer true. index_form_tuple in gist_tuple_present_callback ensures > all attributes are detoasted, and then we do `amcheck_normalize_tuple` > call. > I will remove gistFormNormalizedTuple function in next iteration if > this approach is OK > LGTM. Only one point here: I think maybe we need to move the bt_normalize_tuple comment as well, as now it is more about amcheck_normalize_tuple than bt_normalize_tuple. [0] https://www.postgresql.org/message-id/CAE7r3MKUOGJ0v5-b5fYaF6sxKZvr0J-YXHTJf8u8GUr1tTcvNg%40mail.gmail.com Best regards, Arseniy Mukhin
