On 6/9/25 00:14, Tomas Vondra wrote:
> ...
>
> I propose to split it like this, into three parts, each addressing a
> particular type of mistake:
> 
> 1) gin_check_posting_tree_parent_keys_consistency
> 
> 2) gin_check_parent_keys_consistency / att comparisons
> 
> 3) gin_check_parent_keys_consistency / setting ptr->parenttup (at the end)
> 
> Does this make sense to you? If yes, can you split the patch series like
> this, including a commit message for each part, explaining the fix? We'd
> need the commit message even with a single patch, ofc.
> 
The attached v5 patch splits it along these lines, except that the extra
0001 part merely adds a multicolumn index into the regression test. The
0002-0004 parts are ordered to match the TAP test, i.e. it adds tests.

I've copied the points from the report to the commit messages, but this
needs cleanup/rephrasing, to make it readable. Could you look into
that?Of course, if you think the patches should be split differently,
feel free to move stuff.

And as I said before - if you feel the issues are too intertwined and
can't be split like this (or it just doesn't make sense), please speak
up. We can commit that as a single patch. It still needs the commit
message, though.

regards

-- 
Tomas Vondra
From e90dd32a4260c9ecb7aa5584de5d7f393ac2729b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Mon, 9 Jun 2025 01:42:52 +0200
Subject: [PATCH v5 1/4] amcheck: Add gin_index_check on a multicolumn index

Extend the amcheck regression tests with a gin_index_check() check on
a small multicolumn index.

TODO briefly explain this increases test coverage

Author: Arseniy Mukhin <arseniy.mukhin....@gmail.com>
Discussion: https://postgr.es/m/CAE7r3MJ611B9TE=yqbbncewp7-k64vws+sjk7xf6fjux77u...@mail.gmail.com
---
 contrib/amcheck/expected/check_gin.out | 12 ++++++++++++
 contrib/amcheck/sql/check_gin.sql      | 10 ++++++++++
 2 files changed, 22 insertions(+)

diff --git a/contrib/amcheck/expected/check_gin.out b/contrib/amcheck/expected/check_gin.out
index b4f0b110747..8dd01ced8d1 100644
--- a/contrib/amcheck/expected/check_gin.out
+++ b/contrib/amcheck/expected/check_gin.out
@@ -76,3 +76,15 @@ SELECT gin_index_check('gin_check_jsonb_idx');
 
 -- cleanup
 DROP TABLE gin_check_jsonb;
+-- Test GIN multicolumn index
+CREATE TABLE "gin_check_multicolumn"(a text[], b text[]);
+INSERT INTO gin_check_multicolumn (a,b) values ('{a,c,e}','{b,d,f}');
+CREATE INDEX "gin_check_multicolumn_idx" on gin_check_multicolumn USING GIN(a,b);
+SELECT gin_index_check('gin_check_multicolumn_idx');
+ gin_index_check 
+-----------------
+ 
+(1 row)
+
+-- cleanup
+DROP TABLE gin_check_multicolumn;
diff --git a/contrib/amcheck/sql/check_gin.sql b/contrib/amcheck/sql/check_gin.sql
index 66f42c34311..11caed3d6a8 100644
--- a/contrib/amcheck/sql/check_gin.sql
+++ b/contrib/amcheck/sql/check_gin.sql
@@ -50,3 +50,13 @@ SELECT gin_index_check('gin_check_jsonb_idx');
 
 -- cleanup
 DROP TABLE gin_check_jsonb;
+
+-- Test GIN multicolumn index
+CREATE TABLE "gin_check_multicolumn"(a text[], b text[]);
+INSERT INTO gin_check_multicolumn (a,b) values ('{a,c,e}','{b,d,f}');
+CREATE INDEX "gin_check_multicolumn_idx" on gin_check_multicolumn USING GIN(a,b);
+
+SELECT gin_index_check('gin_check_multicolumn_idx');
+
+-- cleanup
+DROP TABLE gin_check_multicolumn;
-- 
2.49.0

From 7e6913007e00d8f8a6f01f87972492e3c2a925a3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Mon, 9 Jun 2025 01:44:59 +0200
Subject: [PATCH v5 2/4] patch 1: gin_check_parent_keys_consistency

2) Check don't use attnum in comparisons, but for multicolumn indexes
attnum defines order. When we compare max entry page key
with parent key we ignore attnum. It means we occasionally can try to
compare keys of different columns.
While checking order within the same page we skip checking order for
tuples with different attnums now,
but it can be checked. Fix is easy: using ginCompareAttEntries()
instead of ginCompareEntries().

3) Here a code of the split detection

       if (rightlink != InvalidBlockNumber &&
             ginCompareEntries(&state, attnum, page_max_key,
                           page_max_key_category, parent_key,
                           parent_key_category) > 0)
          {
             /* split page detected, install right link to the stack */

Condition seems not right, because the child page max item key never
can be bigger then parent key.
It can be equal to the parentkey, and it means that there was no split
and the parent key that we cached in the stack is still
relevant. Or it could be less then cached parent key and it means that
split took place and old max item key moved to the
right neighbour and current page max item key should be less then
cached parent key. So I think we should replace > with <.

4) Here is the code for checking the order within the entry page.

           /*
           * First block is metadata, skip order check. Also, never check
           * for high key on rightmost page, as this key is not really
           * stored explicitly.
           *
           * Also make sure to not compare entries for different attnums,
           * which may be stored on the same page.
           */
          if (i != FirstOffsetNumber && attnum == prev_attnum &&
stack->blkno != GIN_ROOT_BLKNO &&
             !(i == maxoff && rightlink == InvalidBlockNumber))
          {
             prev_key = gintuple_get_key(&state, prev_tuple,
&prev_key_category);
             if (ginCompareEntries(&state, attnum, prev_key,
                              prev_key_category, current_key,
                              current_key_category) >= 0)

We skip checking the order for the root page, it's not clear why.
Probably there is some mess with the meta page, because
comment says "First block is metadata, skip order check". So I think
we can remove

                    stack->blkno != GIN_ROOT_BLKNO

5) The same place as 4). We skip checking the order for the high key
on the rightmost page, as this key is not really stored explicitly,
but for leaf pages all keys are stored explicitly, so we can check the
order for the last item of the leaf page too.
So I think we can change the condition to this:

            !(i == maxoff && rightlink == InvalidBlockNumber &&
!GinPageIsLeaf(page))

11) When we compare entry tree max page key with parent key:

             if (ginCompareAttEntries(&state, attnum, current_key,
                              current_key_category, parent_key_attnum,
                                      parent_key, parent_key_category) > 0)
             {
                /*
                 * There was a discrepancy between parent and child
                 * tuples. We need to verify it is not a result of
                 * concurrent call of gistplacetopage(). So, lock parent
                 * and try to find downlink for current page. It may be
                 * missing due to concurrent page split, this is OK.
                 */
                pfree(stack->parenttup);
                stack->parenttup = gin_refind_parent(rel, stack->parentblk,
                                            stack->blkno, strategy);

I think we can remove gin_refind_parent() and do ereport right away here.
The same logic as with 3). AFAIK it's impossible to have a child item
with a key that is higher than the cached parent key.
Parent key bounds what keys we can insert into the child page, so it
seems there is no way how they can appear there.
---
 contrib/amcheck/meson.build         |   1 +
 contrib/amcheck/t/006_verify_gin.pl | 197 ++++++++++++++++++++++++++++
 contrib/amcheck/verify_gin.c        |  37 +++---
 3 files changed, 217 insertions(+), 18 deletions(-)
 create mode 100644 contrib/amcheck/t/006_verify_gin.pl

diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index b33e8c9b062..1f0c347ed54 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -49,6 +49,7 @@ tests += {
       't/003_cic_2pc.pl',
       't/004_verify_nbtree_unique.pl',
       't/005_pitr.pl',
+      't/006_verify_gin.pl',
     ],
   },
 }
diff --git a/contrib/amcheck/t/006_verify_gin.pl b/contrib/amcheck/t/006_verify_gin.pl
new file mode 100644
index 00000000000..8c5975d2e37
--- /dev/null
+++ b/contrib/amcheck/t/006_verify_gin.pl
@@ -0,0 +1,197 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+my $blksize;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('test');
+$node->init(no_data_checksums => 1);
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$blksize = int($node->safe_psql('postgres', 'SHOW block_size;'));
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+$node->safe_psql(
+	'postgres', q(
+		CREATE OR REPLACE FUNCTION  random_string( INT ) RETURNS text AS $$
+		SELECT string_agg(substring('0123456789bcdfghjkmnpqrstvwxyz', ceil(random() * 30)::integer, 1), '') from generate_series(1, $1);
+		$$ LANGUAGE SQL;));
+
+# Tests
+invalid_entry_order_leaf_page_test();
+invalid_entry_order_inner_page_test();
+invalid_entry_columns_order_test();
+
+sub invalid_entry_order_leaf_page_test
+{
+	my $relname = "test";
+	my $indexname = "test_gin_idx";
+
+	$node->safe_psql(
+		'postgres', qq(
+		DROP TABLE IF EXISTS $relname;
+		CREATE TABLE $relname (a text[]);
+		CREATE INDEX $indexname ON $relname USING gin (a);
+		INSERT INTO $relname (a) VALUES ('{aaaaa,bbbbb}');
+		SELECT gin_clean_pending_list('$indexname');
+	 ));
+	my $relpath = relation_filepath($indexname);
+
+	$node->stop;
+
+	my $blkno = 1;  # root
+
+	# produce wrong order by replacing aaaaa with ccccc
+	string_replace_block(
+		$relpath,
+		"aaaaa",
+		'"ccccc"',
+		$blksize,
+		$blkno
+	);
+
+	$node->start;
+
+	my ($result, $stdout, $stderr) = $node->psql('postgres', qq(SELECT gin_index_check('$indexname')));
+	ok($stderr =~ "index \"$indexname\" has wrong tuple order on entry tree page, block 1, offset 2, rightlink 4294967295");
+}
+
+sub invalid_entry_order_inner_page_test
+{
+	my $relname = "test";
+	my $indexname = "test_gin_idx";
+
+	$node->safe_psql(
+		'postgres', qq(
+		DROP TABLE IF EXISTS $relname;
+		CREATE TABLE $relname (a text[]);
+		CREATE INDEX $indexname ON $relname USING gin (a);
+		INSERT INTO $relname (a) VALUES (('{' || 'pppppppppp' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'qqqqqqqqqq' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'rrrrrrrrrr' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'ssssssssss' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'tttttttttt' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'uuuuuuuuuu' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'vvvvvvvvvv' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'wwwwwwwwww' || random_string(1870) ||'}')::text[]);
+		SELECT gin_clean_pending_list('$indexname');
+	));
+	my $relpath = relation_filepath($indexname);
+
+	$node->stop;
+
+	my $blkno = 1;  # root
+
+	# we have rrrrrrrrr... and tttttttttt... as keys in the root, so produce wrong order by replacing rrrrrrrrrr....
+	string_replace_block(
+		$relpath,
+		"rrrrrrrrrr",
+		'"zzzzzzzzzz"',
+		$blksize,
+		$blkno
+	);
+
+	$node->start;
+
+	my ($result, $stdout, $stderr) = $node->psql('postgres', qq(SELECT gin_index_check('$indexname')));
+	ok($stderr =~ "index \"$indexname\" has wrong tuple order on entry tree page, block 1, offset 2, rightlink 4294967295");
+}
+
+sub invalid_entry_columns_order_test
+{
+	my $relname = "test";
+	my $indexname = "test_gin_idx";
+
+	$node->safe_psql(
+		'postgres', qq(
+		DROP TABLE IF EXISTS $relname;
+		CREATE TABLE $relname (a text[],b text[]);
+		CREATE INDEX $indexname ON $relname USING gin (a,b);
+		INSERT INTO $relname (a,b) VALUES ('{aaa}','{bbb}');
+		SELECT gin_clean_pending_list('$indexname');
+	));
+	my $relpath = relation_filepath($indexname);
+
+	$node->stop;
+
+	my $blkno = 1;  # root
+
+	# mess column numbers
+	# root items order before: (1,aaa), (2,bbb)
+	# root items order after:  (2,aaa), (1,bbb)
+	my $attrno_1 = pack('s', 1);
+	my $attrno_2 = pack('s', 2);
+
+	my $find = qr/($attrno_1)(.)(aaa)/s;
+	my $replace = '"' . $attrno_2 . '$2$3"';
+	string_replace_block(
+		$relpath,
+		$find,
+		$replace,
+		$blksize,
+		$blkno
+	);
+
+	$find = qr/($attrno_2)(.)(bbb)/s;
+	$replace = '"' . $attrno_1 . '$2$3"';
+	string_replace_block(
+		$relpath,
+		$find,
+		$replace,
+		$blksize,
+		$blkno
+	);
+
+	$node->start;
+
+	my ($result, $stdout, $stderr) = $node->psql('postgres', qq(SELECT gin_index_check('$indexname')));
+	ok($stderr =~ "index \"$indexname\" has wrong tuple order on entry tree page, block 1, offset 2, rightlink 4294967295");
+}
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+	my ($relname) = @_;
+
+	my $pgdata = $node->data_dir;
+	my $rel = $node->safe_psql('postgres',
+		qq(SELECT pg_relation_filepath('$relname')));
+	die "path not found for relation $relname" unless defined $rel;
+	return "$pgdata/$rel";
+}
+
+sub string_replace_block
+{
+	my ($filename, $find, $replace, $blksize, $blkno) = @_;
+
+	my $fh;
+	open($fh, '+<', $filename) or BAIL_OUT("open failed: $!");
+	binmode $fh;
+
+	my $offset = $blkno * $blksize;
+	my $buffer;
+
+	sysseek($fh, $offset, 0) or BAIL_OUT("seek failed: $!");
+	sysread($fh, $buffer, $blksize) or BAIL_OUT("read failed: $!");
+
+	$buffer =~ s/$find/$replace/gee;
+
+	sysseek($fh, $offset, 0) or BAIL_OUT("seek failed: $!");
+	syswrite($fh, $buffer) or BAIL_OUT("write failed: $!");
+
+	close($fh) or BAIL_OUT("close failed: $!");
+
+	return;
+}
+
+done_testing();
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index b5f363562e3..26b98571b56 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -463,17 +463,18 @@ gin_check_parent_keys_consistency(Relation rel,
 			Datum		parent_key = gintuple_get_key(&state,
 													  stack->parenttup,
 													  &parent_key_category);
+			OffsetNumber parent_key_attnum = gintuple_get_attrnum(&state, stack->parenttup);
 			ItemId		iid = PageGetItemIdCareful(rel, stack->blkno,
 												   page, maxoff);
 			IndexTuple	idxtuple = (IndexTuple) PageGetItem(page, iid);
-			OffsetNumber attnum = gintuple_get_attrnum(&state, idxtuple);
+			OffsetNumber page_max_key_attnum = gintuple_get_attrnum(&state, idxtuple);
 			GinNullCategory page_max_key_category;
 			Datum		page_max_key = gintuple_get_key(&state, idxtuple, &page_max_key_category);
 
 			if (rightlink != InvalidBlockNumber &&
-				ginCompareEntries(&state, attnum, page_max_key,
-								  page_max_key_category, parent_key,
-								  parent_key_category) > 0)
+				ginCompareAttEntries(&state, page_max_key_attnum, page_max_key,
+									 page_max_key_category, parent_key_attnum,
+									 parent_key, parent_key_category) < 0)
 			{
 				/* split page detected, install right link to the stack */
 				GinScanItem *ptr;
@@ -528,20 +529,18 @@ gin_check_parent_keys_consistency(Relation rel,
 			current_key = gintuple_get_key(&state, idxtuple, &current_key_category);
 
 			/*
-			 * First block is metadata, skip order check. Also, never check
-			 * for high key on rightmost page, as this key is not really
-			 * stored explicitly.
+			 * Never check for high key on rightmost inner page, as this key
+			 * is not really stored explicitly.
 			 *
 			 * Also make sure to not compare entries for different attnums,
 			 * which may be stored on the same page.
 			 */
-			if (i != FirstOffsetNumber && attnum == prev_attnum && stack->blkno != GIN_ROOT_BLKNO &&
-				!(i == maxoff && rightlink == InvalidBlockNumber))
+			if (i != FirstOffsetNumber && !(i == maxoff && rightlink == InvalidBlockNumber && !GinPageIsLeaf(page)))
 			{
 				prev_key = gintuple_get_key(&state, prev_tuple, &prev_key_category);
-				if (ginCompareEntries(&state, attnum, prev_key,
-									  prev_key_category, current_key,
-									  current_key_category) >= 0)
+				if (ginCompareAttEntries(&state, prev_attnum, prev_key,
+										 prev_key_category, attnum,
+										 current_key, current_key_category) >= 0)
 					ereport(ERROR,
 							(errcode(ERRCODE_INDEX_CORRUPTED),
 							 errmsg("index \"%s\" has wrong tuple order on entry tree page, block %u, offset %u, rightlink %u",
@@ -556,13 +555,14 @@ gin_check_parent_keys_consistency(Relation rel,
 				i == maxoff)
 			{
 				GinNullCategory parent_key_category;
+				OffsetNumber parent_key_attnum = gintuple_get_attrnum(&state, stack->parenttup);
 				Datum		parent_key = gintuple_get_key(&state,
 														  stack->parenttup,
 														  &parent_key_category);
 
-				if (ginCompareEntries(&state, attnum, current_key,
-									  current_key_category, parent_key,
-									  parent_key_category) > 0)
+				if (ginCompareAttEntries(&state, attnum, current_key,
+										 current_key_category, parent_key_attnum,
+										 parent_key, parent_key_category) > 0)
 				{
 					/*
 					 * There was a discrepancy between parent and child
@@ -581,6 +581,7 @@ gin_check_parent_keys_consistency(Relation rel,
 							 stack->blkno, stack->parentblk);
 					else
 					{
+						parent_key_attnum = gintuple_get_attrnum(&state, stack->parenttup);
 						parent_key = gintuple_get_key(&state,
 													  stack->parenttup,
 													  &parent_key_category);
@@ -589,9 +590,9 @@ gin_check_parent_keys_consistency(Relation rel,
 						 * Check if it is properly adjusted. If succeed,
 						 * proceed to the next key.
 						 */
-						if (ginCompareEntries(&state, attnum, current_key,
-											  current_key_category, parent_key,
-											  parent_key_category) > 0)
+						if (ginCompareAttEntries(&state, attnum, current_key,
+												 current_key_category, parent_key_attnum,
+												 parent_key, parent_key_category) > 0)
 							ereport(ERROR,
 									(errcode(ERRCODE_INDEX_CORRUPTED),
 									 errmsg("index \"%s\" has inconsistent records on page %u offset %u",
-- 
2.49.0

From f967749e3021e2c3b5c2680275f35aa6d428abf9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Mon, 9 Jun 2025 01:45:56 +0200
Subject: [PATCH v5 3/4] patch 2: gin_check_parent_keys_consistency

1) When we add new items to the entry tree stack, ptr->parenttup is always null
because GinPageGetOpaque(page)->rightlink is never NULL.

             /* last tuple in layer has no high key */
                if (i != maxoff && !GinPageGetOpaque(page)->rightlink)
                    ptr->parenttup = CopyIndexTuple(idxtuple);
                else
                    ptr->parenttup = NULL;

Right way to check if entry doesn't have right neighbour is

            GinPageGetOpaque(page)->rightlink == InvalidBlockNumber

But even if we fix it, the condition would not do what the comment
says. If we want to have NULL as parenttup
only for the last tuple of the btree layer, the right check would be:

                if (i == maxoff && rightlink == InvalidBlockNumber)
                    ptr->parenttup = NULL;
             else
                    ptr->parenttup = CopyIndexTuple(idxtuple);

8) In the function gin_refind_parent() the code

                if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)

triggers Assert(ItemPointerIsValid(pointer)) within the
ItemPointerGetBlockNumber(), because itup->t_tid could be invalid.
AFAIS GIN code uses a special method GinGetDownlink(itup) that avoids
this Assert. So we can use it here too.

Please find the attached patch with fixes for issues above. Also there
are added regression test for multicolumn index,
and several tap tests with some basic corrupted index cases. I'm not
sure if it's the right way to write such tests and would
be glad to hear any feedback, especially about
invalid_entry_columns_order_test() where it seems important to
preserve
byte ordering. Also all tests expect standard page size 8192 now.
---
 contrib/amcheck/t/006_verify_gin.pl | 78 +++++++++++++++++++++++++++++
 contrib/amcheck/verify_gin.c        |  8 +--
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/contrib/amcheck/t/006_verify_gin.pl b/contrib/amcheck/t/006_verify_gin.pl
index 8c5975d2e37..a999a13d183 100644
--- a/contrib/amcheck/t/006_verify_gin.pl
+++ b/contrib/amcheck/t/006_verify_gin.pl
@@ -31,6 +31,8 @@ $node->safe_psql(
 invalid_entry_order_leaf_page_test();
 invalid_entry_order_inner_page_test();
 invalid_entry_columns_order_test();
+inconsistent_with_parent_key__parent_key_corrupted_test();
+inconsistent_with_parent_key__child_key_corrupted_test();
 
 sub invalid_entry_order_leaf_page_test
 {
@@ -158,6 +160,82 @@ sub invalid_entry_columns_order_test
 	ok($stderr =~ "index \"$indexname\" has wrong tuple order on entry tree page, block 1, offset 2, rightlink 4294967295");
 }
 
+sub inconsistent_with_parent_key__parent_key_corrupted_test
+{
+	my $relname = "test";
+	my $indexname = "test_gin_idx";
+
+	$node->safe_psql(
+		'postgres', qq(
+		DROP TABLE IF EXISTS $relname;
+		CREATE TABLE $relname (a text[]);
+		CREATE INDEX $indexname ON $relname USING gin (a);
+		INSERT INTO $relname (a) VALUES (('{' || 'llllllllll' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'mmmmmmmmmm' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'nnnnnnnnnn' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'xxxxxxxxxx' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'yyyyyyyyyy' || random_string(1870) ||'}')::text[]);
+		SELECT gin_clean_pending_list('$indexname');
+	));
+	my $relpath = relation_filepath($indexname);
+
+	$node->stop;
+
+	my $blkno = 1;  # root
+
+	# we have nnnnnnnnnn... as parent key in the root, so replace it with something smaller then child's keys
+	string_replace_block(
+		$relpath,
+		"nnnnnnnnnn",
+		'"aaaaaaaaaa"',
+		$blksize,
+		$blkno
+	);
+
+	$node->start;
+
+	my ($result, $stdout, $stderr) = $node->psql('postgres', qq(SELECT gin_index_check('$indexname')));
+	ok($stderr =~ "index \"$indexname\" has inconsistent records on page 5 offset 3");
+}
+
+sub inconsistent_with_parent_key__child_key_corrupted_test
+{
+	my $relname = "test";
+	my $indexname = "test_gin_idx";
+
+	$node->safe_psql(
+		'postgres', qq(
+		DROP TABLE IF EXISTS $relname;
+		CREATE TABLE $relname (a text[]);
+		CREATE INDEX $indexname ON $relname USING gin (a);
+		INSERT INTO $relname (a) VALUES (('{' || 'llllllllll' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'mmmmmmmmmm' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'nnnnnnnnnn' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'xxxxxxxxxx' || random_string(1870) ||'}')::text[]);
+		INSERT INTO $relname (a) VALUES (('{' || 'yyyyyyyyyy' || random_string(1870) ||'}')::text[]);
+		SELECT gin_clean_pending_list('$indexname');
+	 ));
+	my $relpath = relation_filepath($indexname);
+
+	$node->stop;
+
+	my $blkno = 5;  # leaf
+
+	# we have nnnnnnnnnn... as parent key in the root, so replace child key with something bigger
+	string_replace_block(
+		$relpath,
+		"nnnnnnnnnn",
+		'"pppppppppp"',
+		$blksize,
+		$blkno
+	);
+
+	$node->start;
+
+	my ($result, $stdout, $stderr) = $node->psql('postgres', qq(SELECT gin_index_check('$indexname')));
+	ok($stderr =~ "index \"$indexname\" has inconsistent records on page 5 offset 3");
+}
+
 # Returns the filesystem path for the named relation.
 sub relation_filepath
 {
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 26b98571b56..8f6a5410cb7 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -609,10 +609,10 @@ gin_check_parent_keys_consistency(Relation rel,
 				ptr = (GinScanItem *) palloc(sizeof(GinScanItem));
 				ptr->depth = stack->depth + 1;
 				/* last tuple in layer has no high key */
-				if (i != maxoff && !GinPageGetOpaque(page)->rightlink)
-					ptr->parenttup = CopyIndexTuple(idxtuple);
-				else
+				if (i == maxoff && rightlink == InvalidBlockNumber)
 					ptr->parenttup = NULL;
+				else
+					ptr->parenttup = CopyIndexTuple(idxtuple);
 				ptr->parentblk = stack->blkno;
 				ptr->blkno = GinGetDownlink(idxtuple);
 				ptr->parentlsn = lsn;
@@ -750,7 +750,7 @@ gin_refind_parent(Relation rel, BlockNumber parentblkno,
 		ItemId		p_iid = PageGetItemIdCareful(rel, parentblkno, parentpage, o);
 		IndexTuple	itup = (IndexTuple) PageGetItem(parentpage, p_iid);
 
-		if (ItemPointerGetBlockNumber(&(itup->t_tid)) == childblkno)
+		if (GinGetDownlink(itup) == childblkno)
 		{
 			/* Found it! Make copy and return it */
 			result = CopyIndexTuple(itup);
-- 
2.49.0

From 3f19acdaaeaa7db51ef66adf58befc5598339683 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@vondra.me>
Date: Mon, 9 Jun 2025 01:46:13 +0200
Subject: [PATCH v5 4/4] patch 3:
 gin_check_posting_tree_parent_keys_consistency

6) In posting tree parent key check part:

              /*
              * Check if this tuple is consistent with the downlink in the
              * parent.
              */
             if (stack->parentblk != InvalidBlockNumber && i == maxoff &&
                ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
                ereport(ERROR,
                      ...
Here we don't check if stack->parentkey is valid, so sometimes we
compare invalid parentkey (because we can have
valid parentblk and invalid parentkey the same time). Invalid
parentkey is always bigger, so the code never triggers
ereport, but it doesn't look right. so probably we can rewrite it this way:

                if (i == maxoff && ItemPointerIsValid(&stack->parentkey) &&
                ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)

7) When producing stack entries for posting tree check, we set parent
key like this:

             /*
              * Set rightmost parent key to invalid item pointer. Its value
              * is 'Infinity' and not explicitly stored.
              */
             if (rightlink == InvalidBlockNumber)
                ItemPointerSetInvalid(&ptr->parentkey);
             else
                ptr->parentkey = posting_item->key;

We set invalid parent key for all items of the rightmost page. But
it's the only rightmost item that doesn't have an explicit
parentkey (actually the comment says exactly this, but the code does a
different thing). All others have an explicit parent
key and we can set it. So fix can look like this:

                if (rightlink == InvalidBlockNumber && i == maxoff)
                ItemPointerSetInvalid(&ptr->parentkey);
             else
                ptr->parentkey = posting_item->key;

But for (rightlink == InvalidBlockNumber && i == maxoff)
posting_item->key is always (0,0) (we check it a little bit earlier),
so I think we can simplify it:

                ptr->parentkey = posting_item->key;
---
 contrib/amcheck/t/006_verify_gin.pl | 39 +++++++++++++++++++++++++++++
 contrib/amcheck/verify_gin.c        | 12 +++------
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/contrib/amcheck/t/006_verify_gin.pl b/contrib/amcheck/t/006_verify_gin.pl
index a999a13d183..e8afeb038fc 100644
--- a/contrib/amcheck/t/006_verify_gin.pl
+++ b/contrib/amcheck/t/006_verify_gin.pl
@@ -33,6 +33,7 @@ invalid_entry_order_inner_page_test();
 invalid_entry_columns_order_test();
 inconsistent_with_parent_key__parent_key_corrupted_test();
 inconsistent_with_parent_key__child_key_corrupted_test();
+inconsistent_with_parent_key__parent_key_corrupted_posting_tree_test();
 
 sub invalid_entry_order_leaf_page_test
 {
@@ -236,6 +237,44 @@ sub inconsistent_with_parent_key__child_key_corrupted_test
 	ok($stderr =~ "index \"$indexname\" has inconsistent records on page 5 offset 3");
 }
 
+sub inconsistent_with_parent_key__parent_key_corrupted_posting_tree_test
+{
+	my $relname = "test";
+	my $indexname = "test_gin_idx";
+
+	$node->safe_psql(
+		'postgres', qq(
+		DROP TABLE IF EXISTS $relname;
+		CREATE TABLE $relname (a text[]);
+		INSERT INTO $relname (a) select ('{aaaaa}') from generate_series(1,10000);
+		CREATE INDEX $indexname ON $relname USING gin (a);
+	));
+	my $relpath = relation_filepath($indexname);
+
+	$node->stop;
+
+	my $blkno = 2;  # posting tree root
+
+	# we have a posting tree for 'aaaaa' key with the root at 2nd block
+	# and two leaf pages 3 and 4. for 4th page high key is (65,52), so let's make it a little bit
+	# smaller, so that there are tid's in leaf page that are larger then the new high key.
+	my $find = pack('S', 65) . pack('S', 52);
+	my $replace = '"' . pack('S', 64) . pack('S', 52) . '"';
+	string_replace_block(
+		$relpath,
+		$find,
+		$replace,
+		$blksize,
+		$blkno
+	);
+
+	$node->start;
+
+	my ($result, $stdout, $stderr) = $node->psql('postgres', qq(SELECT gin_index_check('$indexname')));
+	ok($stderr =~ "index \"$indexname\": tid exceeds parent's high key in postingTree leaf on block 4");
+}
+
+
 # Returns the filesystem path for the named relation.
 sub relation_filepath
 {
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 8f6a5410cb7..427cf1669a6 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -346,7 +346,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
 				 * Check if this tuple is consistent with the downlink in the
 				 * parent.
 				 */
-				if (stack->parentblk != InvalidBlockNumber && i == maxoff &&
+				if (i == maxoff && ItemPointerIsValid(&stack->parentkey) &&
 					ItemPointerCompare(&stack->parentkey, &posting_item->key) < 0)
 					ereport(ERROR,
 							(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -359,14 +359,10 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
 				ptr->depth = stack->depth + 1;
 
 				/*
-				 * Set rightmost parent key to invalid item pointer. Its value
-				 * is 'Infinity' and not explicitly stored.
+				 * The rightmost parent key is always invalid item pointer.
+				 * Its value is 'Infinity' and not explicitly stored.
 				 */
-				if (rightlink == InvalidBlockNumber)
-					ItemPointerSetInvalid(&ptr->parentkey);
-				else
-					ptr->parentkey = posting_item->key;
-
+				ptr->parentkey = posting_item->key;
 				ptr->parentblk = stack->blkno;
 				ptr->blkno = BlockIdGetBlockNumber(&posting_item->child_blkno);
 				ptr->next = stack->next;
-- 
2.49.0

Reply via email to