Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes
On Fri, Nov 3, 2017 at 1:34 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > With this v3 patch (assuming this is the one you just committed
> > as ec42a1dcb30de235b252f6d4) am now getting make check failures.
>
> There's a followup commit already :-(
>
> regards, tom lane
>

Yeah, that fixed it.  Thanks.

Jeff


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Jeff Janes  writes:
> With this v3 patch (assuming this is the one you just committed
> as ec42a1dcb30de235b252f6d4) am now getting make check failures.

There's a followup commit already :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes
Hi Alvaro,

With this v3 patch (assuming this is the one you just committed
as ec42a1dcb30de235b252f6d4) am now getting make check failures.

brin_summarize_range is returning unexpected values.

CentOS6,
PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-18), 64-bit

Cheers,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Alvaro Herrera wrote:

> I think your argument is sensible for some uses (where people run manual
> VACUUM after loading data) but not others (where people just use manual
> VACUUM in place of autovacuuming -- because they don't trust autovac, or
> the schedule isn't convenient, or whatever other reason).  I've seen
> both things being done in production.

BTW I also noticed that creating an index does summarize the first range
(rather than leaving it empty, as this rationale would suggest).  I
think we changed this at the last minute before commit in 9.5 and never
revisited; I'm inclined to change it for pg11 (but not now, of course).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Do we still need the complication in brinsummarize to discriminate
> >> against the last partial range?  Now that the lock consideration
> >> is gone, I think that might be a wart.
> 
> > In the case of VACUUM, it's not desirable to create a summarization for
> > the last partial range, because if the table is still being filled, that
> > would slow down the insertion process.
> 
> Hm.  Okay, but you should change the comment then, because "we do not want
> to spend one RelationGetNumberOfBlocks call" is a pretty weak reason.

Changed.

> Also, I think I would accept that argument for autovacuum, but maybe
> not so much for a manual vacuum.  Maybe you should drive it off
> IsAutovacuumWorker rather than which operation is being done.

I think your argument is sensible for some uses (where people run manual
VACUUM after loading data) but not others (where people just use manual
VACUUM in place of autovacuuming -- because they don't trust autovac, or
the schedule isn't convenient, or whatever other reason).  I've seen
both things being done in production.  If we do as you suggest, there is
no way to do the manual vacuum without summarizing the partial also; the
approach of doing the partial only in brin_summarize_new_values lets the
user choose what to do.

Once upon a time I thought about adding a reloption to let the user
choose what to do, but I never got around to writing a patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Do we still need the complication in brinsummarize to discriminate
>> against the last partial range?  Now that the lock consideration
>> is gone, I think that might be a wart.

> In the case of VACUUM, it's not desirable to create a summarization for
> the last partial range, because if the table is still being filled, that
> would slow down the insertion process.

Hm.  Okay, but you should change the comment then, because "we do not want
to spend one RelationGetNumberOfBlocks call" is a pretty weak reason.

Also, I think I would accept that argument for autovacuum, but maybe
not so much for a manual vacuum.  Maybe you should drive it off
IsAutovacuumWorker rather than which operation is being done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> 
> > Yeah, I think this approach results in better code.  The attached patch
> > implements that, and it passes the test for me (incl. calling
> > brin_summarize_new_values concurrently with vacuum, when running the
> > insert; the former does include the final page range whereas the latter
> > does not.)
> 
> Hm, so IIUC the point is that once the placeholder tuple is in, we can
> rely on concurrent inserters to update it for insertions into pages that
> are added after we determine our scan stop point.  But if the scan stop
> point is chosen before inserting the placeholder, then we have a race
> condition.

Exactly.  We don't need to scan those pages once the placeholder tuple
is in.

> The given code seems a brick or so short of a load, though: if the table
> has been extended sufficiently, it could compute scanNumBlks as larger
> than bs_pagesPerRange, no?  You need to clamp the computation result.

Oops, right.

> Also, shouldn't the passed-in heapBlk always be a multiple of
> pagesPerRange already?

Yeah, I guess I can turn that into an assert.

> Do we still need the complication in brinsummarize to discriminate
> against the last partial range?  Now that the lock consideration
> is gone, I think that might be a wart.

You mean this code?

/*
 * Unless requested to summarize even a partial range, go away 
now if
 * we think the next range is partial.
 *
 * Maybe the table already grew to cover this range completely, 
but we
 * don't want to spend a whole RelationGetNumberOfBlocks to 
find out,
 * so just leave it for next time.
 */
if (!include_partial &&
(startBlk + pagesPerRange > heapNumBlocks))
break;

In the case of VACUUM, it's not desirable to create a summarization for
the last partial range, because if the table is still being filled, that
would slow down the insertion process.  So we pass include_partial=false
there.  In brin_summarize_new_values, the theory is that the user called
that function because they're done loading (at least temporarily) so
it's better to process the partial range.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 6d31088c39d455637179853a3c72a7d9e20fe053 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Nov 2017 18:35:10 +0100
Subject: [PATCH v3] Fix summarization concurrent with relation extension

---
 src/backend/access/brin/brin.c | 91 --
 1 file changed, 62 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e6909d7aea..72ac8929bd 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -67,7 +67,7 @@ static BrinBuildState *initialize_brin_buildstate(Relation 
idxRel,
   BrinRevmap *revmap, 
BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber 
pageRange,
- double *numSummarized, double *numExisting);
+ bool include_partial, double *numSummarized, double 
*numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
 BrinTuple *b);
@@ -791,7 +791,7 @@ brinvacuumcleanup(IndexVacuumInfo *info, 
IndexBulkDeleteResult *stats)
 
brin_vacuum_scan(info->index, info->strategy);
 
-   brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES,
+   brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES, false,
  >num_index_tuples, 
>num_index_tuples);
 
heap_close(heapRel, AccessShareLock);
@@ -909,7 +909,7 @@ brin_summarize_range(PG_FUNCTION_ARGS)

RelationGetRelationName(indexRel;
 
/* OK, do it */
-   brinsummarize(indexRel, heapRel, heapBlk, , NULL);
+   brinsummarize(indexRel, heapRel, heapBlk, true, , NULL);
 
relation_close(indexRel, ShareUpdateExclusiveLock);
relation_close(heapRel, ShareUpdateExclusiveLock);
@@ -1129,7 +1129,8 @@ terminate_brin_buildstate(BrinBuildState *state)
 }
 
 /*
- * Summarize the given page range of the given index.
+ * On the given BRIN index, summarize the heap page range that corresponds
+ * to the heap block number given.
  *
  * This routine can run in parallel with insertions into the heap.  To avoid
  * missing those values from the summary tuple, we first insert a placeholder
@@ -1139,6 +1140,12 @@ terminate_brin_buildstate(BrinBuildState *state)
  * update of the index value happens 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera  writes:
> Alvaro Herrera wrote:
>> Maybe a solution is to call RelationGetNumberOfBlocks() after the
>> placeholder tuple has been inserted, for the case where we would be
>> scanning past end of relation; passing InvalidBlockNumber as stop point
>> would indicate to do things that way.  I'll try with that approach now.

> Yeah, I think this approach results in better code.  The attached patch
> implements that, and it passes the test for me (incl. calling
> brin_summarize_new_values concurrently with vacuum, when running the
> insert; the former does include the final page range whereas the latter
> does not.)

Hm, so IIUC the point is that once the placeholder tuple is in, we can
rely on concurrent inserters to update it for insertions into pages that
are added after we determine our scan stop point.  But if the scan stop
point is chosen before inserting the placeholder, then we have a race
condition.

The given code seems a brick or so short of a load, though: if the table
has been extended sufficiently, it could compute scanNumBlks as larger
than bs_pagesPerRange, no?  You need to clamp the computation result.
Also, shouldn't the passed-in heapBlk always be a multiple of
pagesPerRange already?

Do we still need the complication in brinsummarize to discriminate
against the last partial range?  Now that the lock consideration
is gone, I think that might be a wart.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Alvaro Herrera wrote:

> Maybe a solution is to call RelationGetNumberOfBlocks() after the
> placeholder tuple has been inserted, for the case where we would be
> scanning past end of relation; passing InvalidBlockNumber as stop point
> would indicate to do things that way.  I'll try with that approach now.

Yeah, I think this approach results in better code.  The attached patch
implements that, and it passes the test for me (incl. calling
brin_summarize_new_values concurrently with vacuum, when running the
insert; the former does include the final page range whereas the latter
does not.)

Tomas Vondra wrote:

> FWIW this patch fixes the issue for me - I can no longer reproduce the
> bitmapscan vs. seqscan result discrepancies (even with the extra UPDATE
> phase).

Thanks for testing!  This confirms that the issue was correctly
identified.  Would you try the current patch, which is better than the
other one?  It's significantly different that I think it invalidates
prior testing.

Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From b3568a5475526fbd7768bd4d74c6ad681ede920f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Nov 2017 18:35:10 +0100
Subject: [PATCH v2] Fix summarization concurrent with relation extension

---
 src/backend/access/brin/brin.c | 78 ++
 1 file changed, 49 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e6909d7aea..ad512c2c5f 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -67,7 +67,7 @@ static BrinBuildState *initialize_brin_buildstate(Relation 
idxRel,
   BrinRevmap *revmap, 
BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber 
pageRange,
- double *numSummarized, double *numExisting);
+ bool include_partial, double *numSummarized, double 
*numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
 BrinTuple *b);
@@ -791,7 +791,7 @@ brinvacuumcleanup(IndexVacuumInfo *info, 
IndexBulkDeleteResult *stats)
 
brin_vacuum_scan(info->index, info->strategy);
 
-   brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES,
+   brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES, false,
  >num_index_tuples, 
>num_index_tuples);
 
heap_close(heapRel, AccessShareLock);
@@ -909,7 +909,7 @@ brin_summarize_range(PG_FUNCTION_ARGS)

RelationGetRelationName(indexRel;
 
/* OK, do it */
-   brinsummarize(indexRel, heapRel, heapBlk, , NULL);
+   brinsummarize(indexRel, heapRel, heapBlk, true, , NULL);
 
relation_close(indexRel, ShareUpdateExclusiveLock);
relation_close(heapRel, ShareUpdateExclusiveLock);
@@ -1129,7 +1129,8 @@ terminate_brin_buildstate(BrinBuildState *state)
 }
 
 /*
- * Summarize the given page range of the given index.
+ * On the given BRIN index, summarize the heap page range that corresponds
+ * to the heap block number given.
  *
  * This routine can run in parallel with insertions into the heap.  To avoid
  * missing those values from the summary tuple, we first insert a placeholder
@@ -1139,6 +1140,12 @@ terminate_brin_buildstate(BrinBuildState *state)
  * update of the index value happens in a loop, so that if somebody updates
  * the placeholder tuple after we read it, we detect the case and try again.
  * This ensures that the concurrently inserted tuples are not lost.
+ *
+ * A further corner case is this routine being asked to summarize the partial
+ * range at the end of the table.  heapNumBlocks is the (possibly outdated)
+ * table size; if we notice that the requested range lies beyond that size,
+ * we re-compute the table size after inserting the placeholder tuple, to
+ * avoid missing pages that were appended recently.
  */
 static void
 summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
@@ -1160,6 +1167,20 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState 
*state, Relation heapRel,
   heapBlk, phtup, phsz);
 
/*
+* Whenever we're scanning what we believe to be the final range on the
+* table (i.e. one that might be partial) we need to recompute our idea 
of
+* what the latest page is after inserting the placeholder tuple.  This
+* ensures that we don't miss pages just because they were extended by
+* concurrent backends that didn't have the chance to see our 
placeholder
+* tuple.  Fortunately, this should occur 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:

> > Rather than remove the capability, I'd be inclined to make
> > brin_summarize_new_values summarize the final partial range, and have
> > VACUUM not do it.  Would that be too inconsistent?
> 
> That doesn't really get you out of the problem that this is an abuse of
> the relation extension lock, and is likely to cause issues when people
> try to optimize that locking mechanism.

Right.

> Why is it that the regular technique doesn't work, ie create a placeholder
> tuple and let it get added to by any insertions that happen?

The problem is that we determine relation size (and scan stop point)
before inserting the placeholder tuple, so any relation extension that
occurs after we read the size is not covered by the scan.  The reason we
do this is to avoid calling RelationGetNumberOfBlocks once for each
range, since it's known to be very expensive.

Maybe a solution is to call RelationGetNumberOfBlocks() after the
placeholder tuple has been inserted, for the case where we would be
scanning past end of relation; passing InvalidBlockNumber as stop point
would indicate to do things that way.  I'll try with that approach now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tomas Vondra
Hi,

On 11/02/2017 06:45 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> Unfortunately, I think we still have a problem ... I've been wondering
>> if we end up producing correct indexes, so I've done a simple test.
> 
> Here's a proposed patch that should fix this problem (and does, in my
> testing).  Would you please give it a try?
> 
> This patch changes two things:
> 
> 1. in VACUUM or brin_summarize_new_values, we only process fully loaded
> ranges, and ignore the partial range at end of table.
> 
> 2. when summarization is requested on the partial range at the end of a
> table, we acquire extension lock on the rel, then compute relation size
> and run summarization with the lock held.  This guarantees that we don't
> miss any pages.  This is bad for concurrency though, so it's only done
> in that specific scenario.
> 

FWIW this patch fixes the issue for me - I can no longer reproduce the
bitmapscan vs. seqscan result discrepancies (even with the extra UPDATE
phase).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> So what would happen if we just don't summarize partial ranges?

> Index scan would always have to read all the heap pages for that partial
> range.  Maybe not a big issue, but when you finish loading a table, it'd
> be good to have a mechanism to summarize that partial range ...

I guess if the ranges are big this might not be nice.

> Rather than remove the capability, I'd be inclined to make
> brin_summarize_new_values summarize the final partial range, and have
> VACUUM not do it.  Would that be too inconsistent?

That doesn't really get you out of the problem that this is an abuse of
the relation extension lock, and is likely to cause issues when people
try to optimize that locking mechanism.

Why is it that the regular technique doesn't work, ie create a placeholder
tuple and let it get added to by any insertions that happen?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> If VACUUM and brin_summarize_new_values both ignore the partial
> >> range, then what else would request this?  Can't we just decree
> >> that we don't summarize the partial range, period?
> 
> > brin_summarize_range() can do it.
> 
> So what would happen if we just don't summarize partial ranges?

Index scan would always have to read all the heap pages for that partial
range.  Maybe not a big issue, but when you finish loading a table, it'd
be good to have a mechanism to summarize that partial range ...

Rather than remove the capability, I'd be inclined to make
brin_summarize_new_values summarize the final partial range, and have
VACUUM not do it.  Would that be too inconsistent?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> If VACUUM and brin_summarize_new_values both ignore the partial
>> range, then what else would request this?  Can't we just decree
>> that we don't summarize the partial range, period?

> brin_summarize_range() can do it.

So what would happen if we just don't summarize partial ranges?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:

> > 2. when summarization is requested on the partial range at the end of a
> > table, we acquire extension lock on the rel, then compute relation size
> > and run summarization with the lock held.  This guarantees that we don't
> > miss any pages.  This is bad for concurrency though, so it's only done
> > in that specific scenario.
> 
> Hm, I wonder how this will play with the active proposals around
> reimplementing relation extension locks.  All that work seems to be
> assuming that the extension lock is only held for a short time and
> nothing much beyond physical extension is done while holding it.
> I'm afraid that you may be introducing a risk of e.g. deadlocks
> if you do this.

Ouch ... yeah, that could be a problem.

Another idea I had was to just insert the placeholder tuple while
holding the extension lock, then release the lock while the
summarization is done.  It would be a bit of a break of the current
separation of concerns, but I'm not convinced that the current setup is
perfect, so maybe that's okay.

> If VACUUM and brin_summarize_new_values both ignore the partial
> range, then what else would request this?  Can't we just decree
> that we don't summarize the partial range, period?

brin_summarize_range() can do it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera  writes:
> 1. in VACUUM or brin_summarize_new_values, we only process fully loaded
> ranges, and ignore the partial range at end of table.

OK.

> 2. when summarization is requested on the partial range at the end of a
> table, we acquire extension lock on the rel, then compute relation size
> and run summarization with the lock held.  This guarantees that we don't
> miss any pages.  This is bad for concurrency though, so it's only done
> in that specific scenario.

Hm, I wonder how this will play with the active proposals around
reimplementing relation extension locks.  All that work seems to be
assuming that the extension lock is only held for a short time and
nothing much beyond physical extension is done while holding it.
I'm afraid that you may be introducing a risk of e.g. deadlocks
if you do this.

If VACUUM and brin_summarize_new_values both ignore the partial
range, then what else would request this?  Can't we just decree
that we don't summarize the partial range, period?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tomas Vondra wrote:

> Unfortunately, I think we still have a problem ... I've been wondering
> if we end up producing correct indexes, so I've done a simple test.

Here's a proposed patch that should fix this problem (and does, in my
testing).  Would you please give it a try?

This patch changes two things:

1. in VACUUM or brin_summarize_new_values, we only process fully loaded
ranges, and ignore the partial range at end of table.

2. when summarization is requested on the partial range at the end of a
table, we acquire extension lock on the rel, then compute relation size
and run summarization with the lock held.  This guarantees that we don't
miss any pages.  This is bad for concurrency though, so it's only done
in that specific scenario.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From a760bfd44afeff8d1629599411ac7ce87acc7d26 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Nov 2017 18:35:10 +0100
Subject: [PATCH] Fix summarization concurrent with relation extension

---
 src/backend/access/brin/brin.c | 91 --
 1 file changed, 70 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..7696c0700c 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -29,6 +29,7 @@
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
+#include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/index_selfuncs.h"
 #include "utils/memutils.h"
@@ -68,6 +69,10 @@ static BrinBuildState *initialize_brin_buildstate(Relation 
idxRel,
 static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber 
pageRange,
  double *numSummarized, double *numExisting);
+static void determine_summarization_range(BlockNumber pageRange,
+ BlockNumber 
heapNumBlocks,
+ BlockNumber 
pagesPerRange,
+ BlockNumber 
*startBlk, BlockNumber *endBlk);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
 BrinTuple *b);
@@ -1253,30 +1258,16 @@ brinsummarize(Relation index, Relation heapRel, 
BlockNumber pageRange,
BlockNumber startBlk;
BlockNumber endBlk;
 
-   /* determine range of pages to process; nothing to do for an empty 
table */
-   heapNumBlocks = RelationGetNumberOfBlocks(heapRel);
-   if (heapNumBlocks == 0)
-   return;
-
revmap = brinRevmapInitialize(index, , NULL);
 
-   if (pageRange == BRIN_ALL_BLOCKRANGES)
+   /* determine range of pages to process */
+   heapNumBlocks = RelationGetNumberOfBlocks(heapRel);
+   determine_summarization_range(pageRange, heapNumBlocks, pagesPerRange,
+ , 
);
+   if (startBlk > heapNumBlocks)
{
-   startBlk = 0;
-   endBlk = heapNumBlocks;
-   }
-   else
-   {
-   startBlk = (pageRange / pagesPerRange) * pagesPerRange;
-   /* Nothing to do if start point is beyond end of table */
-   if (startBlk > heapNumBlocks)
-   {
-   brinRevmapTerminate(revmap);
-   return;
-   }
-   endBlk = startBlk + pagesPerRange;
-   if (endBlk > heapNumBlocks)
-   endBlk = heapNumBlocks;
+   brinRevmapTerminate(revmap);
+   return;
}
 
/*
@@ -1287,9 +1278,31 @@ brinsummarize(Relation index, Relation heapRel, 
BlockNumber pageRange,
{
BrinTuple  *tup;
OffsetNumber off;
+   boolext_lock_held = false;
 
CHECK_FOR_INTERRUPTS();
 
+   /*
+* Whenever we're scanning a range that would go past what we 
know to
+* be end-of-relation, we need to ensure we scan to the true 
end of the
+* relation, or we risk missing tuples in recently added pages. 
 To do
+* this, we hold the relation extension lock from here till 
we're done
+* summarizing, and compute the relation size afresh now.  The 
logic in
+* determine_summarization_range ensures that this is not done 
in the
+* common cases of vacuum or brin_summarize_new_values(), but 
instead
+* only when we're specifically asked to summarize the last 
range in
+* the relation.
+*/
+   if (heapBlk + pagesPerRange > 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Where are we on this --- do you want me to push the brin_doupdate
>> fix I proposed, or were you intending to merge that into a
>> larger patch?

> Please push your fixes, I'll post my proposed patch for the other bug
> afterwards; they are unrelated problems after all.

OK, I was not sure if you considered them related or not.  Will push.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way.
> >> But I see this misbehavior too.  Looking ...
> 
> > Turns out that this is related to concurrent growth of the table while
> > the summarization process is scanning -- so new pages have appeared at
> > the end of the table after the end point has been determined.  It would
> > be a pain to determine number of blocks for each range, so I'm looking
> > for a simple way to fix it without imposing so much overhead.
> 
> Where are we on this --- do you want me to push the brin_doupdate
> fix I proposed, or were you intending to merge that into a
> larger patch?

Please push your fixes, I'll post my proposed patch for the other bug
afterwards; they are unrelated problems after all.

If you prefer me to push your fixes, I can do that -- let me know.

> If I'm to do it, is there a reason not to back-patch to all branches
> with BRIN?

No, all these fixes should go back to 9.5.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera  writes:
>> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way.
>> But I see this misbehavior too.  Looking ...

> Turns out that this is related to concurrent growth of the table while
> the summarization process is scanning -- so new pages have appeared at
> the end of the table after the end point has been determined.  It would
> be a pain to determine number of blocks for each range, so I'm looking
> for a simple way to fix it without imposing so much overhead.

Where are we on this --- do you want me to push the brin_doupdate
fix I proposed, or were you intending to merge that into a
larger patch?  If I'm to do it, is there a reason not to back-patch
to all branches with BRIN?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-01 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
> > FWIW I can reproduce this on 9.5, and I don't even need to run the
> > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to
> > produce broken BRIN indexes :-(
> 
> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way.
> But I see this misbehavior too.  Looking ...

Turns out that this is related to concurrent growth of the table while
the summarization process is scanning -- so new pages have appeared at
the end of the table after the end point has been determined.  It would
be a pain to determine number of blocks for each range, so I'm looking
for a simple way to fix it without imposing so much overhead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tomas Vondra wrote:

> FWIW I can reproduce this on 9.5, and I don't even need to run the
> UPDATE part. That is, INSERT + VACUUM running concurrently is enough to
> produce broken BRIN indexes :-(

Hmm, I'm pretty sure we stress-tested brin in pretty much the same way.
But I see this misbehavior too.  Looking ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra


On 10/31/2017 11:44 PM, Tomas Vondra wrote:
> ...
> Unfortunately, I think we still have a problem ... I've been wondering
> if we end up producing correct indexes, so I've done a simple test.
> 
> 1) create the table as before
> 
> 2) let the insert + vacuum run for some time, to see if there are
> crashes (result: no crashes after one hour, inserting ~92M rows)
> 
> 3) do a bunch of random updates on the data (while still doing the
> concurrent vacuum in another session)
> 
> 4) run a bunch of simple queries to compare the results, essentially
> 
>-- BRIN index
>SET enable_bitmapscan = on;
>SELECT COUNT(*) FROM brin_test WHERE a = $1;
> 
> 
>-- seq scan
>SET enable_bitmapscan = on;
>SELECT COUNT(*) FROM brin_test WHERE a = $1;
> 
> and unfortunately what I get is not particularly pleasant:
> 
> test=# set enable_bitmapscan = on;
> SET
> test=# select count(*) from brin_test where a = 0;
>  count
> ---
>   9062
> (1 row)
> 
> test=# set enable_bitmapscan = off;
> SET
> test=# select count(*) from brin_test where a = 0;
>  count
> ---
>   9175
> (1 row)
> 
> Attached is a SQL script with commands I used. You'll need to copy the
> commands into multiple psql sessions, though, to simulate concurrent
> activity).
> 

FWIW I can reproduce this on 9.5, and I don't even need to run the
UPDATE part. That is, INSERT + VACUUM running concurrently is enough to
produce broken BRIN indexes :-(


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra
Hi,

On 10/31/2017 08:46 PM, Tom Lane wrote:
> I wrote:
>> maybe
>> we just have some run-of-the-mill bugs to find, like the off-the-end
>> bug I spotted in brin_doupdate.  There's apparently at least one
>> more, but given the error message it must be something like not
>> checking for a page to have turned into a revmap page.  Shouldn't
>> be too hard to find...
> 
> Actually, I think it might be as simple as the attached.
> brin_getinsertbuffer checks for the old page having turned into revmap,
> but the "samepage" path in brin_doupdate does not :-(
> 
> With this applied, Alvaro's version of the test case has survived
> without error for quite a bit longer than its former MTBF.  There
> might still be some issues though in other code paths.
> 

That does fix the crashes for me - I've been unable to reproduce any
even after one hour (it took a couple of minutes to crash before).

Unfortunately, I think we still have a problem ... I've been wondering
if we end up producing correct indexes, so I've done a simple test.

1) create the table as before

2) let the insert + vacuum run for some time, to see if there are
crashes (result: no crashes after one hour, inserting ~92M rows)

3) do a bunch of random updates on the data (while still doing the
concurrent vacuum in another session)

4) run a bunch of simple queries to compare the results, essentially

   -- BRIN index
   SET enable_bitmapscan = on;
   SELECT COUNT(*) FROM brin_test WHERE a = $1;


   -- seq scan
   SET enable_bitmapscan = on;
   SELECT COUNT(*) FROM brin_test WHERE a = $1;

and unfortunately what I get is not particularly pleasant:

test=# set enable_bitmapscan = on;
SET
test=# select count(*) from brin_test where a = 0;
 count
---
  9062
(1 row)

test=# set enable_bitmapscan = off;
SET
test=# select count(*) from brin_test where a = 0;
 count
---
  9175
(1 row)

Attached is a SQL script with commands I used. You'll need to copy the
commands into multiple psql sessions, though, to simulate concurrent
activity).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


brin-test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
I wrote:
> maybe
> we just have some run-of-the-mill bugs to find, like the off-the-end
> bug I spotted in brin_doupdate.  There's apparently at least one
> more, but given the error message it must be something like not
> checking for a page to have turned into a revmap page.  Shouldn't
> be too hard to find...

Actually, I think it might be as simple as the attached.
brin_getinsertbuffer checks for the old page having turned into revmap,
but the "samepage" path in brin_doupdate does not :-(

With this applied, Alvaro's version of the test case has survived
without error for quite a bit longer than its former MTBF.  There
might still be some issues though in other code paths.

regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..b0f86f3 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 113,121 
  
  	/*
  	 * Check that the old tuple wasn't updated concurrently: it might have
! 	 * moved someplace else entirely ...
  	 */
! 	if (!ItemIdIsNormal(oldlp))
  	{
  		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
  
--- 113,127 
  
  	/*
  	 * Check that the old tuple wasn't updated concurrently: it might have
! 	 * moved someplace else entirely, and for that matter the whole page
! 	 * might've become a revmap page.  Note that in the first two cases
! 	 * checked here, the "oldlp" we just calculated is garbage; but
! 	 * PageGetItemId() is simple enough that it was safe to do that
! 	 * calculation anyway.
  	 */
! 	if (!BRIN_IS_REGULAR_PAGE(oldpage) ||
! 		oldoff > PageGetMaxOffsetNumber(oldpage) ||
! 		!ItemIdIsNormal(oldlp))
  	{
  		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I really don't understand how any of this "let's release the buffer
>> lock and then take it back later" logic is supposed to work reliably.

> So summarize_range first inserts the placeholder tuple, which is there
> purposefully for other processes to update concurrently; then, without
> blocking any other process, scan the page range and update the
> placeholder tuple (this could take a long time, so it'd be a bad idea
> to hold buffer lock for that long).

Yeah, agreed, and your upthread point about avoiding deadlock when we
need to take two buffer locks at the same time is also something that
it's hard to see any other way around.  Maybe we just have to find a
way to make the existing structure work.  The sticking point is that
not only might the tuple we expected have been deleted, but someone
could have put an unrelated tuple in its place.  Are you confident
that you can detect that and recover from it?  If you're sure that
brin_tuples_equal() is trustworthy for this purpose, then maybe
we just have some run-of-the-mill bugs to find, like the off-the-end
bug I spotted in brin_doupdate.  There's apparently at least one
more, but given the error message it must be something like not
checking for a page to have turned into a revmap page.  Shouldn't
be too hard to find...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote:

> I really don't understand how any of this "let's release the buffer
> lock and then take it back later" logic is supposed to work reliably.

So summarize_range first inserts the placeholder tuple, which is there
purposefully for other processes to update concurrently; then, without
blocking any other process, scan the page range and update the
placeholder tuple (this could take a long time, so it'd be a bad idea
to hold buffer lock for that long).

I think what we should do is rethink the locking considerations in
brin_doupdate vs. brinGetTupleForHeapBlock, and how they are used in
summarize_range and brininsert.  In summarize_range, instead of hoping
that in some cases we will not need to re-obtain the placeholder tuple,
just do that in all cases keeping the buffer locked until the tuple is
updated.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote:
> So in a few more runs this morning using Alvaro's simplified test case,
> I have seen the following behaviors not previously reported:

> 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
> offnum: %u" error report as PageIndexTupleDeleteNoCompact.  I note the
> same message appears in plain PageIndexTupleDelete as well.
> I think we've been too hasty to assume all instances of this came out of
> PageIndexTupleDeleteNoCompact.

Ah, I wasn't paying close attention to the originator routine of the
message, but you're right, I see this one too.

> 2. Crashes in the data-insertion process, not only the process running
> summarize_range:

Yeah, I saw these.  I was expecting it, since the two routines
(brininsert and summarize_range) pretty much share the insertion
protocol.

> I really don't understand how any of this "let's release the buffer
> lock and then take it back later" logic is supposed to work reliably.

Yeah, evidently that was way too optimistic and I'll need to figure out
a better mechanism to handle this.

The intention was to avoid deadlocks while locking the target page for
the insertion: by having both pages start unlocked we can simply lock
them in block number order.  If we keep the page containing the tuple
locked, I don't see how to reliably avoid a deadlock while acquiring a
buffer to insert the new tuple.

> BTW, while I'm bitching, it seems fairly insane from a concurrency
> standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
> while holding at least one and possibly two buffer locks.  Shouldn't
> that be done someplace else?

Hmm.  I spent a lot of effort (commit ccc4c074994d) to avoid leaving
pages uninitialized / unrecorded in FSM.  I left this on purpose on the
rationale that trying to fix it would make the callsites more convoluted
(the retry logic doesn't help).  But as I recall this was supposed to be
done only in the rare case where the buffer could not be returned to
caller ... but that's not what the current code does, so there is
something wrong there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
So in a few more runs this morning using Alvaro's simplified test case,
I have seen the following behaviors not previously reported:

1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
offnum: %u" error report as PageIndexTupleDeleteNoCompact.  I note the
same message appears in plain PageIndexTupleDelete as well.
I think we've been too hasty to assume all instances of this came out of
PageIndexTupleDeleteNoCompact.

2. Crashes in the data-insertion process, not only the process running
summarize_range:

#1  0x003b78a33c75 in abort () at abort.c:92
#2  0x0086e527 in errfinish (dummy=) at elog.c:557
#3  0x0086f334 in elog_finish (elevel=, 
fmt=) at elog.c:1378
#4  0x0075d48f in PageIndexTupleDeleteNoCompact (
rel=, buf=2772, page=0x7f0188438080 "\002", offnum=39)
at bufpage.c:995
#5  0x00476fd2 in brin_doupdate (idxrel=0x7f0186633e90, 
pagesPerRange=1, revmap=0xba, heapBlk=2542, oldbuf=2772, oldoff=39, 
origtup=0x2784010, origsz=8, newtup=0x2784098, newsz=400, 
samepage=0 '\000') at brin_pageops.c:270
#6  0x00475430 in brininsert (idxRel=0x7f0186633e90, 
values=0x7ffce7b827f0, nulls=0x7ffce7b828f0 "", heaptid=0x279fb3c, 
heapRel=, checkUnique=, 
indexInfo=0x279e370) at brin.c:292
#7  0x0061da18 in ExecInsertIndexTuples (slot=0x279e638, 
tupleid=0x279fb3c, estate=0x279dd10, noDupErr=0 '\000', specConflict=0x0, 
arbiterIndexes=0x0) at execIndexing.c:387

The postmaster log for this, with even more debug logging thrown in:

2017-10-31 11:58:03.466 EDT [23506] LOG:  
summarize_range(brin_test_a_idx,2542,2543) created placeholder at 68,39
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG:  brin_doupdate(brin_test_a_idx) at 
68,39: getinsertbuffer returned page 70
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG:  deleting tuple 39 (of 39) in rel 
brin_test_a_idx page 68
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG:  brin_doupdate(brin_test_a_idx) at 
68,39: placed at 70,1
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23509] LOG:  brin_doupdate(brin_test_a_idx) at 
68,39: getinsertbuffer returned page 70
2017-10-31 11:58:03.466 EDT [23509] STATEMENT:  insert into brin_test values 
(repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23509] LOG:  deleting tuple 39 (of 38) in rel 
brin_test_a_idx page 68
2017-10-31 11:58:03.466 EDT [23509] STATEMENT:  insert into brin_test values 
(repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23509] PANIC:  invalid index offnum: 39
2017-10-31 11:58:03.466 EDT [23509] STATEMENT:  insert into brin_test values 
(repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23506] LOG:  
summarize_range(brin_test_a_idx,2542,2543): update at 68,39 succeeded
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:04.517 EDT [23493] LOG:  server process (PID 23509) was 
terminated by signal 6: Aborted

So what evidently happened here is that brininsert decided it needed to
insert into the placeholder tuple that summarize_range had just created,
but by the time it got lock on the buffer, summarize_range had deleted
the placeholder and instead created a new tuple on another page.

HAH: I see how the connection to PageIndexTupleDeleteNoCompact's
change applies.  After re-locking the buffer, brin_doupdate tries
to check whether the tuple's been deleted, but it needs to do it
more like this, else it might be accessing off the end of the
lp array:

/*
 * Check that the old tuple wasn't updated concurrently: it might have
 * moved someplace else entirely ...
 */
-   if (!ItemIdIsNormal(oldlp))
+   if (oldoff > PageGetMaxOffsetNumber(oldpage) ||
+   !ItemIdIsNormal(oldlp))
{
LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);

This is a pre-existing bug, but before 24992c6d, it would only
have manifested in the (very?) unusual corner case that
PageIndexDeleteNoCompact was able to reset the page to empty.

However, when I fix that and then run Alvaro's test case, I get
ERROR:  corrupted BRIN index: inconsistent range map
so there's more creepy-crawlies around here somewhere.

I really don't understand how any of this "let's release the buffer
lock and then take it back later" logic is supposed to work reliably.

BTW, while I'm bitching, it seems fairly insane from a concurrency
standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
while holding at 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane  wrote:
>> Yeah, we're still missing an understanding of why we didn't see it
>> before; the inadequate locking was surely there before.

> Because 24992c6d has added a check on the offset number by using
> PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
> tighter, no?

No, I don't see how it's tighter.  The old code matched supplied
offnum(s) against the indexes of not-unused items, and then after
that loop it complained if they weren't all matched.  So it should
also have failed, albeit with a different error message, if it were
passed an offnum corresponding to a no-longer-live tuple.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Michael Paquier
On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> So: I put the blame on the fact that summarize_range() thinks that
>>> the tuple offset it has for the placeholder tuple is guaranteed to
>>> hold good, even across possibly-long intervals where it's holding
>>> no lock on the containing buffer.
>
>> Yeah, I think this is a pretty reasonable explanation for the problem.
>> I don't understand why it doesn't fail in 9.6.
>
> Yeah, we're still missing an understanding of why we didn't see it
> before; the inadequate locking was surely there before.  I'm guessing
> that somehow the previous behavior of PageIndexDeleteNoCompact managed
> to mask the problem (perhaps only by not throwing an error, which doesn't
> imply that the index state was good afterwards).  But I don't see quite
> how it did that.

Because 24992c6d has added a check on the offset number by using
PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
tighter, no? I have not tested, and I lack knowledge about the brin
code, but it seems to me that if we had a similar check then things
could likely blow up.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> So: I put the blame on the fact that summarize_range() thinks that
>> the tuple offset it has for the placeholder tuple is guaranteed to
>> hold good, even across possibly-long intervals where it's holding
>> no lock on the containing buffer.

> Yeah, I think this is a pretty reasonable explanation for the problem.
> I don't understand why it doesn't fail in 9.6.

Yeah, we're still missing an understanding of why we didn't see it
before; the inadequate locking was surely there before.  I'm guessing
that somehow the previous behavior of PageIndexDeleteNoCompact managed
to mask the problem (perhaps only by not throwing an error, which doesn't
imply that the index state was good afterwards).  But I don't see quite
how it did that.

One thing I think we do know now is that the bug is triggered by two
concurrent executions of summarize_range.  So I'd look for edge cases
like whether the placeholder tuple can be deleted and then reinserted
into the same lp index.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Tom Lane wrote:

> So: I put the blame on the fact that summarize_range() thinks that
> the tuple offset it has for the placeholder tuple is guaranteed to
> hold good, even across possibly-long intervals where it's holding
> no lock on the containing buffer.

Yeah, I think this is a pretty reasonable explanation for the problem.
I don't understand why it doesn't fail in 9.6.

> Fixing this without creating new problems is beyond my familiarity
> with the BRIN code.  It looks like it might be nontrivial :-(

Looking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Thanks everyone for the analysis downthread.  Here's a test case that
provokes the crash faster.  Initialize with

create table brin_test (a text);
create index on brin_test using brin (a) with (pages_per_range = 1);

Then in one psql, run this:
select brin_summarize_new_values('brin_test_a_idx') \watch 0.1

and a pgbench running a script with this line (one client is enough):

insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), 
int4(random() * 200)));

It crashes in 10-20 seconds for me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
I wrote:
> Hmm.  The index offnum being complained of is one past the end of the
> lp array.  I think I see what about that commit changed the behavior:
> the old code for PageIndexDeleteNoCompact never changed the length
> of the lp array, except in the corner case where the page is becoming
> completely empty.  The new code will shorten the lp array (decrease
> phdr->pd_lower) if it's told to remove the last item.  So if you make
> two successive calls specifying the same offnum, and it's the last one
> on the page, the second one will fail with the symptoms we see here.
> However, so far as I can see, a sequence like that would have failed
> before too, just with a different error message, because once the
> first call had marked the item unused, the second call would not see
> it as a candidate to match.  So I'm not quite sure how that's related
> ... but it seems like it must be.

I'm still confused about why it didn't fail before, but after adding
some additional code to log each call of PageIndexTupleDeleteNoCompact,
I think I've got a smoking gun:

2017-10-30 18:18:44.321 EDT [10932] LOG:  deleting tuple 292 (of 292) in rel 
brin_test_c_idx page 2
2017-10-30 18:18:44.321 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:18:44.393 EDT [10932] LOG:  deleting tuple 292 (of 292) in rel 
brin_test_d_idx page 2
2017-10-30 18:18:44.393 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:18:53.428 EDT [10932] LOG:  deleting tuple 186 (of 186) in rel 
brin_test_e_idx page 3
2017-10-30 18:18:53.428 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:19:13.794 EDT [10938] LOG:  deleting tuple 186 (of 186) in rel 
brin_test_e_idx page 4
2017-10-30 18:19:13.794 EDT [10938] STATEMENT:  insert into brin_test select
 mod(i,10)/25,
 mod(i,10)/25,
 mod(i,10)/25,
 mod(i,10)/25,
 md5((mod(i,10)/25)::text)::uuid
from generate_series(1,10) s(i)
2017-10-30 18:19:13.795 EDT [10932] LOG:  deleting tuple 186 (of 185) in rel 
brin_test_e_idx page 4
2017-10-30 18:19:13.795 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:19:13.795 EDT [10932] PANIC:  invalid index offnum: 186
2017-10-30 18:19:13.795 EDT [10932] STATEMENT:  vacuum brin_test

So what happened here is that the inserting process decided to
summarize concurrently with the VACUUM process, and the inserting
process deleted (or maybe just updated/moved?) the placeholder tuple
that VACUUM was expecting to delete, and then we get the PANIC because
the tuple we're expecting to delete is already gone.

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Fixing this without creating new problems is beyond my familiarity
with the BRIN code.  It looks like it might be nontrivial :-(

regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..04ad804 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 243,249 
  		if (extended)
  			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
  
! 		PageIndexTupleDeleteNoCompact(oldpage, oldoff);
  		newoff = PageAddItem(newpage, (Item) newtup, newsz,
  			 InvalidOffsetNumber, false, false);
  		if (newoff == InvalidOffsetNumber)
--- 243,249 
  		if (extended)
  			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
  
! 		PageIndexTupleDeleteNoCompact(idxrel, oldbuf, oldpage, oldoff);
  		newoff = PageAddItem(newpage, (Item) newtup, newsz,
  			 InvalidOffsetNumber, false, false);
  		if (newoff == InvalidOffsetNumber)
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076..4d5dad3 100644
*** a/src/backend/access/brin/brin_revmap.c
--- b/src/backend/access/brin/brin_revmap.c
*** brinRevmapDesummarizeRange(Relation idxr
*** 409,415 
  	ItemPointerSetInvalid();
  	brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk,
  			invalidIptr);
! 	PageIndexTupleDeleteNoCompact(regPg, regOffset);
  	/* XXX record free space in FSM? */
  
  	MarkBufferDirty(regBuf);
--- 409,415 
  	ItemPointerSetInvalid();
  	brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk,
  			invalidIptr);
! 	PageIndexTupleDeleteNoCompact(idxrel, regBuf, regPg, regOffset);
  	/* XXX record free space in FSM? */
  
  	MarkBufferDirty(regBuf);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 60daa54..c8cc9f8 100644
*** a/src/backend/access/brin/brin_xlog.c
--- b/src/backend/access/brin/brin_xlog.c
*** brin_xlog_update(XLogReaderState *record
*** 150,156 
  
  		offnum 

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Tomas Vondra  writes:
> On 10/30/2017 09:03 PM, Tom Lane wrote:
>> [ scratches head ... ]  Not sure how that could've introduced any
>> problems?  Will look.

> Not sure, but I can confirm Michael's findings - I've been unable to
> reproduce the issue on 1a4be103a5 even after 20 minutes, and on
> 24992c6db9 it failed after only 2.

Hmm.  The index offnum being complained of is one past the end of the
lp array.  I think I see what about that commit changed the behavior:
the old code for PageIndexDeleteNoCompact never changed the length
of the lp array, except in the corner case where the page is becoming
completely empty.  The new code will shorten the lp array (decrease
phdr->pd_lower) if it's told to remove the last item.  So if you make
two successive calls specifying the same offnum, and it's the last one
on the page, the second one will fail with the symptoms we see here.
However, so far as I can see, a sequence like that would have failed
before too, just with a different error message, because once the
first call had marked the item unused, the second call would not see
it as a candidate to match.  So I'm not quite sure how that's related
... but it seems like it must be.

Anyway, my opinion ATM is that PageIndexTupleDeleteNoCompact is fine,
and what we're looking at is some kind of bug in summarize_range
or brin_doupdate, causing them to pass a offset beyond the end of
the page in some corner case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tomas Vondra


On 10/30/2017 09:03 PM, Tom Lane wrote:
> Michael Paquier  writes:
>> b1328d78f is close enough, but per what I see that's actually not
>> true. I have been able to reproduce the problem, which shows up within
>> a window of 2-4 minutes. Still sometimes it can take way longer, and I
>> spotted one test where it took 15 minutes to show up... So, bisecting
>> with a test that looks for core files for 20 minutes, I am seeing that
>> the following commit is actually at fault:
> 
>> commit 24992c6db9fd40f342db1f22747ec9e56483796d
>> Author: Tom Lane 
>> Date:   Fri Sep 9 19:00:59 2016 -0400
> 
>> Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.
> 
> [ scratches head ... ]  Not sure how that could've introduced any
> problems?  Will look.
> 

Not sure, but I can confirm Michael's findings - I've been unable to
reproduce the issue on 1a4be103a5 even after 20 minutes, and on
24992c6db9 it failed after only 2.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Michael Paquier  writes:
> b1328d78f is close enough, but per what I see that's actually not
> true. I have been able to reproduce the problem, which shows up within
> a window of 2-4 minutes. Still sometimes it can take way longer, and I
> spotted one test where it took 15 minutes to show up... So, bisecting
> with a test that looks for core files for 20 minutes, I am seeing that
> the following commit is actually at fault:

> commit 24992c6db9fd40f342db1f22747ec9e56483796d
> Author: Tom Lane 
> Date:   Fri Sep 9 19:00:59 2016 -0400

> Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

[ scratches head ... ]  Not sure how that could've introduced any
problems?  Will look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 9:42 AM, Alvaro Herrera  wrote:
> Tomas Vondra wrote:
>> FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So
>> it seems to be due to something that changed in the last release.
>
> I've been trying to reproduce it for half an hour with no success (I
> think my laptop is just too slow). I bet this is related to the
> addition of PageIndexTupleOverwrite (commit b1328d78f) though I find it
> more likely that this was not *caused* by that commit but rather just
> made it easier to hit.

b1328d78f is close enough, but per what I see that's actually not
true. I have been able to reproduce the problem, which shows up within
a window of 2-4 minutes. Still sometimes it can take way longer, and I
spotted one test where it took 15 minutes to show up... So, bisecting
with a test that looks for core files for 20 minutes, I am seeing that
the following commit is actually at fault:
commit 24992c6db9fd40f342db1f22747ec9e56483796d
Author: Tom Lane 
Date:   Fri Sep 9 19:00:59 2016 -0400

Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

The full generality of deleting an arbitrary number of tuples is no longer
needed, so let's save some code and cycles by replacing the original coding
with an implementation based on PageIndexTupleDelete.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers