05.04.2019 19:41, Anastasia Lubennikova writes:
05.04.2019 18:01, Tom Lane writes:
Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes:
On Fri, Apr 5, 2019 at 2:02 AM Thomas Munro <thomas.mu...@gmail.com>
wrote:
This is a strange failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2019-04-05%2005%3A15%3A00
[ wrong answers from queries using a GIST index ]
There are a couple of other recent instances of this failure, on
francolin and whelk.
Yeah. Given three failures in a couple of days, we can reasonably
guess that the problem was introduced within a day or two prior to
the first one. Looking at what's touched GIST in that time frame,
suspicion has to fall heavily on
9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b.
If I had to bet, I'd bet that there's something wrong with the
machinations described in the commit message:
For GiST, the LSN-NSN interlock makes this a little tricky.
All pages must
be marked with a valid (i.e. non-zero) LSN, so that the
parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for
that during
index build. Change the fake LSN counter to begin from 1000, so
that 1 is
safely smaller than any real or fake LSN. 2 would've been enough
for our
purposes, but let's reserve a bigger range, in case we need more
special
values in the future.
I'll go add this as an open issue.
regards, tom lane
Hi,
I've already noticed the same failure in our company buildfarm and
started the research.
You are right, it's the " Generate less WAL during GiST, GIN and
SP-GiST index build. " patch to blame.
Because of using the GistBuildLSN some pages are not linked correctly,
so index scan cannot find some entries, while seqscan finds them.
In attachment, you can find patch with a test that allows to reproduce
the bug not randomly, but on every run.
Now I'm trying to find a way to fix the issue.
The problem was caused by incorrect detection of the page to insert new
tuple after split.
If gistinserttuple() of the tuple formed by gistgetadjusted() had to
split the page, we must to go back to the parent and
descend back to the child that's a better fit for the new tuple.
Previously this was handled by the code block with the following comment:
* Concurrent split detected. There's no guarantee that the
* downlink for this page is consistent with the tuple we're
* inserting anymore, so go back to parent and rechoose the best
* child.
After introducing GistBuildNSN this code path became unreachable.
To fix it, I added new flag to detect such splits during indexbuild.
The patches with the test and fix are attached.
Many thanks to Teodor Sigaev, who helped to find the bug.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
commit 53938b423ec19a88e470b7b9552de13de5e2634a
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date: Tue Apr 9 18:58:00 2019 +0300
fix for gist create index WAL optimization. Track splits correctly
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 2db790c..ff98d13 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -639,6 +639,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
/* Start from the root */
firststack.blkno = GIST_ROOT_BLKNO;
firststack.lsn = 0;
+ firststack.retry_from_parent = false;
firststack.parent = NULL;
firststack.downlinkoffnum = InvalidOffsetNumber;
state.stack = stack = &firststack;
@@ -693,8 +694,15 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
continue;
}
+ /*
+ * During gistbuild we cannot rely on lsn-nsn comparison,
+ * since all pages have the same GistBuildLSN value in these fields.
+ * So use special flag stack->retry_from_parent to detect the fact that
+ * we need to go back to parent and rechoose the child.
+ */
if (stack->blkno != GIST_ROOT_BLKNO &&
- stack->parent->lsn < GistPageGetNSN(stack->page))
+ ((stack->parent->lsn < GistPageGetNSN(stack->page)) ||
+ stack->retry_from_parent == true))
{
/*
* Concurrent split detected. There's no guarantee that the
@@ -786,6 +794,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
UnlockReleaseBuffer(stack->buffer);
xlocked = false;
state.stack = stack = stack->parent;
+ stack->retry_from_parent = true;
}
continue;
}
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 78e2e3f..361a7a8 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -215,6 +215,12 @@ typedef struct GISTInsertStack
*/
GistNSN lsn;
+ /*
+ * flag to use during gistbuild to recognize page split
+ * (instead of lsn-nsn comparison)
+ */
+ bool retry_from_parent;
+
/* offset of the downlink in the parent page, that points to this page */
OffsetNumber downlinkoffnum;
commit 53f4607b50e0516be05ef27d9369c89bdcb9d82b
Author: Anastasia <a.lubennik...@postgrespro.ru>
Date: Tue Apr 9 18:56:32 2019 +0300
test
diff --git a/contrib/intarray/Makefile b/contrib/intarray/Makefile
index 2505294..3535bb1 100644
--- a/contrib/intarray/Makefile
+++ b/contrib/intarray/Makefile
@@ -9,7 +9,7 @@ DATA = intarray--1.2.sql intarray--1.1--1.2.sql intarray--1.0--1.1.sql \
intarray--unpackaged--1.0.sql
PGFILEDESC = "intarray - functions and operators for arrays of integers"
-REGRESS = _int
+REGRESS = _int _int2
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/intarray/expected/_int2.out b/contrib/intarray/expected/_int2.out
new file mode 100644
index 0000000..8346334
--- /dev/null
+++ b/contrib/intarray/expected/_int2.out
@@ -0,0 +1,47 @@
+CREATE TABLE test__int2( a int[] );
+\copy test__int2 from 'data/test__int.data'
+ANALYZE test__int2;
+SELECT count(*) from test__int2 WHERE a && '{23,50}';
+ count
+-------
+ 403
+(1 row)
+
+SELECT count(*) from test__int2 WHERE a @@ '23|50';
+ count
+-------
+ 403
+(1 row)
+
+CREATE INDEX text_idx2 on test__int2 using gist ( a gist__int_ops );
+SELECT count(*) from test__int2 WHERE a && '{23,50}';
+ count
+-------
+ 403
+(1 row)
+
+SELECT count(*) from test__int2 WHERE a @@ '23|50';
+ count
+-------
+ 403
+(1 row)
+
+SELECT a, t FROM (SELECT a::text, (SELECT COUNT (*) FROM test__int2 t2 WHERE t2.a = t1.a) t from test__int2 t1 WHERE a IS NOT NULL) s WHERE t = 0;
+ a | t
+---+---
+(0 rows)
+
+select * from test__int2 where a = '{193,213,230,266,285,299}';
+ a
+---------------------------
+ {193,213,230,266,285,299}
+(1 row)
+
+set enable_indexscan to off;
+set enable_bitmapscan to off;
+select * from test__int2 where a = '{193,213,230,266,285,299}';
+ a
+---------------------------
+ {193,213,230,266,285,299}
+(1 row)
+
diff --git a/contrib/intarray/sql/_int2.sql b/contrib/intarray/sql/_int2.sql
new file mode 100644
index 0000000..d4adca7
--- /dev/null
+++ b/contrib/intarray/sql/_int2.sql
@@ -0,0 +1,20 @@
+CREATE TABLE test__int2( a int[] );
+\copy test__int2 from 'data/test__int.data'
+ANALYZE test__int2;
+
+SELECT count(*) from test__int2 WHERE a && '{23,50}';
+SELECT count(*) from test__int2 WHERE a @@ '23|50';
+
+CREATE INDEX text_idx2 on test__int2 using gist ( a gist__int_ops );
+
+SELECT count(*) from test__int2 WHERE a && '{23,50}';
+SELECT count(*) from test__int2 WHERE a @@ '23|50';
+
+SELECT a, t FROM (SELECT a::text, (SELECT COUNT (*) FROM test__int2 t2 WHERE t2.a = t1.a) t from test__int2 t1 WHERE a IS NOT NULL) s WHERE t = 0;
+
+select * from test__int2 where a = '{193,213,230,266,285,299}';
+
+set enable_indexscan to off;
+set enable_bitmapscan to off;
+
+select * from test__int2 where a = '{193,213,230,266,285,299}';
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 94b6ad6..2183b3a 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -509,6 +509,7 @@ gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */
{
/* we didn't make the random choice yet for this old best */
keep_current_best = (random() <= (MAX_RANDOM_VALUE / 2)) ? 1 : 0;
+ keep_current_best = 0; /* FIXME only for debugging */
}
if (keep_current_best == 0)
{
@@ -531,6 +532,7 @@ gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */
{
/* we didn't make the random choice yet for this old best */
keep_current_best = (random() <= (MAX_RANDOM_VALUE / 2)) ? 1 : 0;
+ keep_current_best = 0; /* FIXME only for debugging */
}
if (keep_current_best == 1)
break;