Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes
On Fri, Nov 3, 2017 at 1:34 PM, Tom Lane wrote: > Jeff Janes writes: > > With this v3 patch (assuming this is the one you just committed > > as ec42a1dcb30de235b252f6d4) am now getting make check failures. > > There's a followup commit already :-( > >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Jeff Janes writes: > With this v3 patch (assuming this is the one you just committed > as ec42a1dcb30de235b252f6d4) am now getting make check failures. There's a followup commit already :-( regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes
Hi Alvaro, With this v3 patch (assuming this is the one you just committed as ec42a1dcb30de235b252f6d4) am now getting make check failures. brin_summarize_range is returning unexpected values. CentOS6, PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Alvaro Herrera wrote: > I think your argument is sensible for some uses (where people run manual > VACUUM after loading data) but not others (where people just use manual > VACUUM in place of autovacuuming -- because they don't trust autovac, or > the schedule isn't convenient, or whatever other

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Do we still need the complication in brinsummarize to discriminate > >> against the last partial range? Now that the lock consideration > >> is gone, I think that might be a wart. > > > In the case of

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Do we still need the complication in brinsummarize to discriminate >> against the last partial range? Now that the lock consideration >> is gone, I think that might be a wart. > In the case of VACUUM, it's not desirable to

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > > Yeah, I think this approach results in better code. The attached patch > > implements that, and it passes the test for me (incl. calling > > brin_summarize_new_values concurrently with vacuum, when running the > > insert;

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera writes: > Alvaro Herrera wrote: >> Maybe a solution is to call RelationGetNumberOfBlocks() after the >> placeholder tuple has been inserted, for the case where we would be >> scanning past end of relation; passing InvalidBlockNumber as stop point >> would

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Alvaro Herrera wrote: > Maybe a solution is to call RelationGetNumberOfBlocks() after the > placeholder tuple has been inserted, for the case where we would be > scanning past end of relation; passing InvalidBlockNumber as stop point > would indicate to do things that way. I'll try with that

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Rather than remove the capability, I'd be inclined to make > > brin_summarize_new_values summarize the final partial range, and have > > VACUUM not do it. Would that be too inconsistent? > > That doesn't really get you out

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tomas Vondra
Hi, On 11/02/2017 06:45 PM, Alvaro Herrera wrote: > Tomas Vondra wrote: > >> Unfortunately, I think we still have a problem ... I've been wondering >> if we end up producing correct indexes, so I've done a simple test. > > Here's a proposed patch that should fix this problem (and does, in my >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So what would happen if we just don't summarize partial ranges? > Index scan would always have to read all the heap pages for that partial > range. Maybe not a big issue, but when you finish loading a table, it'd > be good to

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> If VACUUM and brin_summarize_new_values both ignore the partial > >> range, then what else would request this? Can't we just decree > >> that we don't summarize the partial range, period? > > >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> If VACUUM and brin_summarize_new_values both ignore the partial >> range, then what else would request this? Can't we just decree >> that we don't summarize the partial range, period? > brin_summarize_range() can do it. So

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > 2. when summarization is requested on the partial range at the end of a > > table, we acquire extension lock on the rel, then compute relation size > > and run summarization with the lock held. This guarantees that we don't >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > 1. in VACUUM or brin_summarize_new_values, we only process fully loaded > ranges, and ignore the partial range at end of table. OK. > 2. when summarization is requested on the partial range at the end of a > table, we acquire extension lock on

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tomas Vondra wrote: > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. Here's a proposed patch that should fix this problem (and does, in my testing). Would you please give it a try? This patch changes

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Where are we on this --- do you want me to push the brin_doupdate >> fix I proposed, or were you intending to merge that into a >> larger patch? > Please push your fixes, I'll post my proposed patch for the other bug >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. > >> But I see this misbehavior too. Looking ... > > > Turns out that this is related to concurrent growth of the table while > > the summarization

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. >> But I see this misbehavior too. Looking ... > Turns out that this is related to concurrent growth of the table while > the summarization process is scanning -- so new

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-01 Thread Alvaro Herrera
Alvaro Herrera wrote: > Tomas Vondra wrote: > > > FWIW I can reproduce this on 9.5, and I don't even need to run the > > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > > produce broken BRIN indexes :-( > > Hmm, I'm pretty sure we stress-tested brin in pretty much the

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tomas Vondra wrote: > FWIW I can reproduce this on 9.5, and I don't even need to run the > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > produce broken BRIN indexes :-( Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. But I see this misbehavior

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra
On 10/31/2017 11:44 PM, Tomas Vondra wrote: > ... > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. > > 1) create the table as before > > 2) let the insert + vacuum run for some time, to see if there

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra
Hi, On 10/31/2017 08:46 PM, Tom Lane wrote: > I wrote: >> maybe >> we just have some run-of-the-mill bugs to find, like the off-the-end >> bug I spotted in brin_doupdate. There's apparently at least one >> more, but given the error message it must be something like not >> checking for a page to

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
I wrote: > maybe > we just have some run-of-the-mill bugs to find, like the off-the-end > bug I spotted in brin_doupdate. There's apparently at least one > more, but given the error message it must be something like not > checking for a page to have turned into a revmap page. Shouldn't > be too

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> I really don't understand how any of this "let's release the buffer >> lock and then take it back later" logic is supposed to work reliably. > So summarize_range first inserts the placeholder tuple, which is there >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote: > I really don't understand how any of this "let's release the buffer > lock and then take it back later" logic is supposed to work reliably. So summarize_range first inserts the placeholder tuple, which is there purposefully for other processes to update concurrently; then,

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote: > So in a few more runs this morning using Alvaro's simplified test case, > I have seen the following behaviors not previously reported: > 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index > offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
So in a few more runs this morning using Alvaro's simplified test case, I have seen the following behaviors not previously reported: 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the same message appears

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Michael Paquier writes: > On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: >> Yeah, we're still missing an understanding of why we didn't see it >> before; the inadequate locking was surely there before. > Because 24992c6d has added a check on the

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Michael Paquier
On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> So: I put the blame on the fact that summarize_range() thinks that >>> the tuple offset it has for the placeholder tuple is guaranteed to >>> hold good,

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So: I put the blame on the fact that summarize_range() thinks that >> the tuple offset it has for the placeholder tuple is guaranteed to >> hold good, even across possibly-long intervals where it's holding >> no lock on the

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Tom Lane wrote: > So: I put the blame on the fact that summarize_range() thinks that > the tuple offset it has for the placeholder tuple is guaranteed to > hold good, even across possibly-long intervals where it's holding > no lock on the containing buffer. Yeah, I think this is a pretty

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Thanks everyone for the analysis downthread. Here's a test case that provokes the crash faster. Initialize with create table brin_test (a text); create index on brin_test using brin (a) with (pages_per_range = 1); Then in one psql, run this: select brin_summarize_new_values('brin_test_a_idx')

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
I wrote: > Hmm. The index offnum being complained of is one past the end of the > lp array. I think I see what about that commit changed the behavior: > the old code for PageIndexDeleteNoCompact never changed the length > of the lp array, except in the corner case where the page is becoming >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Tomas Vondra writes: > On 10/30/2017 09:03 PM, Tom Lane wrote: >> [ scratches head ... ] Not sure how that could've introduced any >> problems? Will look. > Not sure, but I can confirm Michael's findings - I've been unable to > reproduce the issue on 1a4be103a5

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tomas Vondra
On 10/30/2017 09:03 PM, Tom Lane wrote: > Michael Paquier writes: >> b1328d78f is close enough, but per what I see that's actually not >> true. I have been able to reproduce the problem, which shows up within >> a window of 2-4 minutes. Still sometimes it can take way

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Michael Paquier writes: > b1328d78f is close enough, but per what I see that's actually not > true. I have been able to reproduce the problem, which shows up within > a window of 2-4 minutes. Still sometimes it can take way longer, and I > spotted one test where it took

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 9:42 AM, Alvaro Herrera wrote: > Tomas Vondra wrote: >> FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So >> it seems to be due to something that changed in the last release. > > I've been trying to reproduce it for half an