Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes
On Fri, Nov 3, 2017 at 1:34 PM, Tom Lane wrote: > Jeff Janes writes: > > With this v3 patch (assuming this is the one you just committed > > as ec42a1dcb30de235b252f6d4) am now getting make check failures. > > There's a followup commit already :-( > >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Jeff Janes writes: > With this v3 patch (assuming this is the one you just committed > as ec42a1dcb30de235b252f6d4) am now getting make check failures. There's a followup commit already :-( regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Jeff Janes
Hi Alvaro, With this v3 patch (assuming this is the one you just committed as ec42a1dcb30de235b252f6d4) am now getting make check failures. brin_summarize_range is returning unexpected values. CentOS6, PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Alvaro Herrera wrote: > I think your argument is sensible for some uses (where people run manual > VACUUM after loading data) but not others (where people just use manual > VACUUM in place of autovacuuming -- because they don't trust autovac, or > the schedule isn't convenient, or whatever other

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Do we still need the complication in brinsummarize to discriminate > >> against the last partial range? Now that the lock consideration > >> is gone, I think that might be a wart. > > > In the case of

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Do we still need the complication in brinsummarize to discriminate >> against the last partial range? Now that the lock consideration >> is gone, I think that might be a wart. > In the case of VACUUM, it's not desirable to

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > > Yeah, I think this approach results in better code. The attached patch > > implements that, and it passes the test for me (incl. calling > > brin_summarize_new_values concurrently with vacuum, when running the > > insert;

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera writes: > Alvaro Herrera wrote: >> Maybe a solution is to call RelationGetNumberOfBlocks() after the >> placeholder tuple has been inserted, for the case where we would be >> scanning past end of relation; passing InvalidBlockNumber as stop point >> would

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Alvaro Herrera wrote: > Maybe a solution is to call RelationGetNumberOfBlocks() after the > placeholder tuple has been inserted, for the case where we would be > scanning past end of relation; passing InvalidBlockNumber as stop point > would indicate to do things that way. I'll try with that

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Rather than remove the capability, I'd be inclined to make > > brin_summarize_new_values summarize the final partial range, and have > > VACUUM not do it. Would that be too inconsistent? > > That doesn't really get you out

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tomas Vondra
Hi, On 11/02/2017 06:45 PM, Alvaro Herrera wrote: > Tomas Vondra wrote: > >> Unfortunately, I think we still have a problem ... I've been wondering >> if we end up producing correct indexes, so I've done a simple test. > > Here's a proposed patch that should fix this problem (and does, in my >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So what would happen if we just don't summarize partial ranges? > Index scan would always have to read all the heap pages for that partial > range. Maybe not a big issue, but when you finish loading a table, it'd > be good to

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> If VACUUM and brin_summarize_new_values both ignore the partial > >> range, then what else would request this? Can't we just decree > >> that we don't summarize the partial range, period? > > >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> If VACUUM and brin_summarize_new_values both ignore the partial >> range, then what else would request this? Can't we just decree >> that we don't summarize the partial range, period? > brin_summarize_range() can do it. So

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > 2. when summarization is requested on the partial range at the end of a > > table, we acquire extension lock on the rel, then compute relation size > > and run summarization with the lock held. This guarantees that we don't >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > 1. in VACUUM or brin_summarize_new_values, we only process fully loaded > ranges, and ignore the partial range at end of table. OK. > 2. when summarization is requested on the partial range at the end of a > table, we acquire extension lock on

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tomas Vondra wrote: > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. Here's a proposed patch that should fix this problem (and does, in my testing). Would you please give it a try? This patch changes

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Where are we on this --- do you want me to push the brin_doupdate >> fix I proposed, or were you intending to merge that into a >> larger patch? > Please push your fixes, I'll post my proposed patch for the other bug >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. > >> But I see this misbehavior too. Looking ... > > > Turns out that this is related to concurrent growth of the table while > > the summarization

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera writes: >> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. >> But I see this misbehavior too. Looking ... > Turns out that this is related to concurrent growth of the table while > the summarization process is scanning -- so new

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-01 Thread Alvaro Herrera
Alvaro Herrera wrote: > Tomas Vondra wrote: > > > FWIW I can reproduce this on 9.5, and I don't even need to run the > > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > > produce broken BRIN indexes :-( > > Hmm, I'm pretty sure we stress-tested brin in pretty much the

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tomas Vondra wrote: > FWIW I can reproduce this on 9.5, and I don't even need to run the > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > produce broken BRIN indexes :-( Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. But I see this misbehavior

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra
On 10/31/2017 11:44 PM, Tomas Vondra wrote: > ... > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. > > 1) create the table as before > > 2) let the insert + vacuum run for some time, to see if there

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra
Hi, On 10/31/2017 08:46 PM, Tom Lane wrote: > I wrote: >> maybe >> we just have some run-of-the-mill bugs to find, like the off-the-end >> bug I spotted in brin_doupdate. There's apparently at least one >> more, but given the error message it must be something like not >> checking for a page to

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
I wrote: > maybe > we just have some run-of-the-mill bugs to find, like the off-the-end > bug I spotted in brin_doupdate. There's apparently at least one > more, but given the error message it must be something like not > checking for a page to have turned into a revmap page. Shouldn't > be too

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> I really don't understand how any of this "let's release the buffer >> lock and then take it back later" logic is supposed to work reliably. > So summarize_range first inserts the placeholder tuple, which is there >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote: > I really don't understand how any of this "let's release the buffer > lock and then take it back later" logic is supposed to work reliably. So summarize_range first inserts the placeholder tuple, which is there purposefully for other processes to update concurrently; then,

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote: > So in a few more runs this morning using Alvaro's simplified test case, > I have seen the following behaviors not previously reported: > 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index > offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
So in a few more runs this morning using Alvaro's simplified test case, I have seen the following behaviors not previously reported: 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the same message appears

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Michael Paquier writes: > On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: >> Yeah, we're still missing an understanding of why we didn't see it >> before; the inadequate locking was surely there before. > Because 24992c6d has added a check on the

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Michael Paquier
On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> So: I put the blame on the fact that summarize_range() thinks that >>> the tuple offset it has for the placeholder tuple is guaranteed to >>> hold good,

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So: I put the blame on the fact that summarize_range() thinks that >> the tuple offset it has for the placeholder tuple is guaranteed to >> hold good, even across possibly-long intervals where it's holding >> no lock on the

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Tom Lane wrote: > So: I put the blame on the fact that summarize_range() thinks that > the tuple offset it has for the placeholder tuple is guaranteed to > hold good, even across possibly-long intervals where it's holding > no lock on the containing buffer. Yeah, I think this is a pretty

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Thanks everyone for the analysis downthread. Here's a test case that provokes the crash faster. Initialize with create table brin_test (a text); create index on brin_test using brin (a) with (pages_per_range = 1); Then in one psql, run this: select brin_summarize_new_values('brin_test_a_idx')

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
I wrote: > Hmm. The index offnum being complained of is one past the end of the > lp array. I think I see what about that commit changed the behavior: > the old code for PageIndexDeleteNoCompact never changed the length > of the lp array, except in the corner case where the page is becoming >

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Tomas Vondra writes: > On 10/30/2017 09:03 PM, Tom Lane wrote: >> [ scratches head ... ] Not sure how that could've introduced any >> problems? Will look. > Not sure, but I can confirm Michael's findings - I've been unable to > reproduce the issue on 1a4be103a5

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tomas Vondra
On 10/30/2017 09:03 PM, Tom Lane wrote: > Michael Paquier writes: >> b1328d78f is close enough, but per what I see that's actually not >> true. I have been able to reproduce the problem, which shows up within >> a window of 2-4 minutes. Still sometimes it can take way

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Michael Paquier writes: > b1328d78f is close enough, but per what I see that's actually not > true. I have been able to reproduce the problem, which shows up within > a window of 2-4 minutes. Still sometimes it can take way longer, and I > spotted one test where it took

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Michael Paquier
On Mon, Oct 30, 2017 at 9:42 AM, Alvaro Herrera wrote: > Tomas Vondra wrote: >> FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So >> it seems to be due to something that changed in the last release. > > I've been trying to reproduce it for half an

[HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Alvaro Herrera
Tomas Vondra wrote: > FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So > it seems to be due to something that changed in the last release. I've been trying to reproduce it for half an hour with no success (I think my laptop is just too slow). I bet this is related to the

[HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-29 Thread Tomas Vondra
FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So it seems to be due to something that changed in the last release. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via