On 04/01/2019 02:47, Andrey Borodin wrote:
2 янв. 2019 г., в 20:35, Heikki Linnakangas <hlinn...@iki.fi> написал(а):

In patch #1, to do the vacuum scan in physical order:
...
I think this is ready to be committed, except that I didn't do any testing. We 
discussed the need for testing earlier. Did you write some test scripts for 
this, or how have you been testing?
Please see test I used to check left jumps for v18:
0001-Physical-GiST-scan-in-VACUUM-v18-with-test-modificat.patch
0002-Test-left-jumps-v18.patch

To trigger FollowRight GiST sometimes forget to clear follow-right marker simulating crash of an 
insert. This fills logs with "fixing incomplete split" messages. Search for "REMOVE 
THIS" to disable these ill-behavior triggers.
To trigger NSN jump GiST allocate empty page after every real allocation.

In log output I see
2019-01-03 22:27:30.028 +05 [54596] WARNING:  RESCAN TRIGGERED BY NSN
WARNING:  RESCAN TRIGGERED BY NSN
2019-01-03 22:27:30.104 +05 [54596] WARNING:  RESCAN TRIGGERED BY FollowRight
This means that code paths were really executed (for v18).

Thanks! As I noted at https://www.postgresql.org/message-id/2ff57b1f-01b4-eacf-36a2-485a12017f6e%40iki.fi, the test patch left the index corrupt. I fixed it so that it leaves behind incompletely split pages, without the corruption, see attached. It's similar to yours, but let me recap what it does:

* Hacks gistbuild(), create 100 empty pages immediately after the root pages. They are leaked, so they won't be reused until the a VACUUM sees them and puts them to the FSM

* Hacks gistinserttuples(), to leave the split incompleted with 50% probability

* In vacuum, print a line to the log whenever it needs to "jump left"

I used this patch, with the attached test script that's similar to yours, but it also tries to verify that the index returns correct results. It prints a result set like this:

   sum
---------
 -364450
  364450
(2 rows)

If the two numbers add up to 0, the index seems valid. And you should see "RESCAN" lines in the log, to show that jumping left happened. Because the behavior is random and racy, you may need to run the script many times to see both "RESCAN TRIGGERED BY NSN" and "RESCAN TRIGGERED BY FollowRight" cases. Especially the "FollowRight" case happens less frequently than the NSN case, you might need to run the script > 10 times to see it.

I also tried your amcheck tool with this. It did not report any errors.

Attached is also latest version of the patch itself. It is the same as your latest patch v19, except for some tiny comment kibitzing. I'll mark this as Ready for Committer in the commitfest app, and will try to commit it in the next couple of days.

- Heikki

Attachment: gist-vacuum-test.sh
Description: application/shellscript

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 3f52b8f4dc..cad4a2a46e 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1187,6 +1187,8 @@ gistinserttuple(GISTInsertState *state, GISTInsertStack *stack,
 							InvalidBuffer, InvalidBuffer, false, false);
 }
 
+static bool HACK = false;
+
 /* ----------------
  * An extended workhorse version of gistinserttuple(). This version allows
  * inserting multiple tuples, or replacing a single tuple with multiple tuples.
@@ -1230,6 +1232,14 @@ gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
 	CheckForSerializableConflictIn(state->r, NULL, stack->buffer);
 
 	/* Insert the tuple(s) to the page, splitting the page if necessary */
+	if (BufferIsValid(leftchild) && HACK)
+	{
+		/* skip actually inserting */
+		splitinfo = NULL;
+		is_split = false;
+	}
+	else
+	{
 	is_split = gistplacetopage(state->r, state->freespace, giststate,
 							   stack->buffer,
 							   tuples, ntup,
@@ -1238,6 +1248,7 @@ gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
 							   &splitinfo,
 							   true,
 							   state->heapRel);
+	}
 
 	/*
 	 * Before recursing up in case the page was split, release locks on the
@@ -1256,7 +1267,12 @@ gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
 	 * didn't have to split, release it ourselves.
 	 */
 	if (splitinfo)
+	{
+		if (random() % 2 == 0)
+			HACK = true;
 		gistfinishsplit(state, stack, giststate, splitinfo, unlockbuf);
+		HACK = false;
+	}
 	else if (unlockbuf)
 		LockBuffer(stack->buffer, GIST_UNLOCK);
 
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index bd142a3560..fdfc54b009 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -201,6 +201,9 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	buildstate.indtuples = 0;
 	buildstate.indtuplesSize = 0;
 
+	for (int i = 0; i < 100; i++)
+		ReleaseBuffer(ReadBuffer(index, P_NEW));
+
 	/*
 	 * Do the heap scan.
 	 */
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 1cbcb038f7..c306621e1e 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -254,6 +254,11 @@ restart:
 			(opaque->rightlink != InvalidBlockNumber) &&
 			(opaque->rightlink < orig_blkno))
 		{
+			if (GistFollowRight(page))
+				elog(LOG, "RESCAN %u->%u TRIGGERED BY FollowRight", blkno, opaque->rightlink);
+			if (vstate->startNSN < GistPageGetNSN(page))
+				elog(LOG,"RESCAN %u->%u TRIGGERED BY NSN", blkno, opaque->rightlink);
+
 			recurse_to = opaque->rightlink;
 		}
 

Reply via email to