Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-10 Thread Tom Lane
Andres Freund writes: > On 2019-05-07 09:17:11 -0700, Andres Freund wrote: >> Well, it rejiggers the way table locks are acquired for all REINDEX >> INDEX commands, not just in the CONCURRENTLY. But yea, it's probably >> easy to catch issues there on user tables. > Pushed now. OK. I marked the

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-10 Thread Andres Freund
Hi, On 2019-05-07 09:17:11 -0700, Andres Freund wrote: > Hi, > > On 2019-05-07 12:14:43 -0400, Tom Lane wrote: > > Andres Freund writes: > > > On 2019-05-07 12:07:37 -0400, Tom Lane wrote: > > >> The number of deadlock failures is kind of annoying, so I'd rather remove > > >> the tests from

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-07 Thread Andres Freund
Hi, On 2019-05-07 12:14:43 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-05-07 12:07:37 -0400, Tom Lane wrote: > >> The number of deadlock failures is kind of annoying, so I'd rather remove > >> the tests from HEAD sooner than later. What issues around that do you > >> think remain

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-07 Thread Tom Lane
Andres Freund writes: > On 2019-05-07 12:07:37 -0400, Tom Lane wrote: >> The number of deadlock failures is kind of annoying, so I'd rather remove >> the tests from HEAD sooner than later. What issues around that do you >> think remain that these tests would be helpful for? > I was wondering

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-07 Thread Andres Freund
Hi, On 2019-05-07 12:07:37 -0400, Tom Lane wrote: > Andres Freund writes: > > Yea, that might be right. I'm planning to leave the tests in until a > > bunch of the open REINDEX issues are resolved. Not super likely that > > it'd break something, but probably worth anyway? > > The number of

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-07 Thread Tom Lane
Andres Freund writes: > Yea, that might be right. I'm planning to leave the tests in until a > bunch of the open REINDEX issues are resolved. Not super likely that > it'd break something, but probably worth anyway? The number of deadlock failures is kind of annoying, so I'd rather remove the

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-07 Thread Andres Freund
Hi, On 2019-05-07 10:50:19 -0400, Tom Lane wrote: > Andres Freund writes: > > I for sure thought I earlier had an idea that'd actually work. But > > either I've lost it, or it didn't actually work. But perhaps somebody > > else can come up with something based on the above strawman ideas? > >

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-07 Thread Tom Lane
Andres Freund writes: > I for sure thought I earlier had an idea that'd actually work. But > either I've lost it, or it didn't actually work. But perhaps somebody > else can come up with something based on the above strawman ideas? Both of those ideas fail if an autovacuum starts up after you're

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-06 Thread Andres Freund
On 2019-05-06 00:00:04 -0400, Tom Lane wrote: > Andres Freund writes: > > On May 5, 2019 8:56:58 PM PDT, Tom Lane wrote: > >> On this coast, "tonight" is running into "tomorrow" ... you planning > >> to do that soon? > > > I'd planned to finish cooking and eating, and then doing it. Seems like

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-05 Thread Tom Lane
Andres Freund writes: > On May 5, 2019 8:56:58 PM PDT, Tom Lane wrote: >> On this coast, "tonight" is running into "tomorrow" ... you planning >> to do that soon? > I'd planned to finish cooking and eating, and then doing it. Seems like > that'd be plenty early? Sure, dinner can take

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-05 Thread Andres Freund
Hi, On May 5, 2019 8:56:58 PM PDT, Tom Lane wrote: >Andres Freund writes: >> On 2019-05-04 11:04:07 -0400, Tom Lane wrote: >>> I don't think we discussed exactly what "come out" means. My >thought is >>> to leave the test scripts in place (so they can be invoked manually >with >>> EXTRA_TESTS)

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-05 Thread Tom Lane
Andres Freund writes: > On 2019-05-04 11:04:07 -0400, Tom Lane wrote: >> I don't think we discussed exactly what "come out" means. My thought is >> to leave the test scripts in place (so they can be invoked manually with >> EXTRA_TESTS) but remove them from the schedule files. > Yea, that

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-05 Thread Andres Freund
Hi, On 2019-05-04 11:04:07 -0400, Tom Lane wrote: > I don't think we discussed exactly what "come out" means. My thought is > to leave the test scripts in place (so they can be invoked manually with > EXTRA_TESTS) but remove them from the schedule files. Yea, that sounds sensible. I'll do so by

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-04 Thread Tom Lane
Michael Paquier writes: > sidewinder is still pissed of as of HEAD, pointing visibly to f912d7d > as the root cause: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2019-05-03%2021%3A45%3A00 Right, the deadlocks are expected when some previous session is slow about cleaning

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-04 Thread Michael Paquier
On Thu, May 02, 2019 at 07:18:19PM -0400, Tom Lane wrote: > I did manually verify that all branches get through "reindex table > pg_class" and "reindex index pg_class_oid_index" under > CLOBBER_CACHE_ALWAYS, as well as a normal-mode check-world. But CCA > world runs seem like a good idea.

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > On 2019-05-02 16:54:11 -0400, Tom Lane wrote: >> I just finished a successful run of the core regression tests with CCA. >> Given the calendar, I think that's about as much CCA testing as I should >> do personally. I'll make a cleanup pass on this patch and try to get it

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > On 2019-05-02 16:54:11 -0400, Tom Lane wrote: >> How do you feel about the other patch to rejigger the order of operations >> in CommandCounterIncrement? I think that's a bug, but it's probably >> noncritical for most people. What I'm leaning towards for that one is >>

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi, On 2019-05-02 16:54:11 -0400, Tom Lane wrote: > I just finished a successful run of the core regression tests with CCA. > Given the calendar, I think that's about as much CCA testing as I should > do personally. I'll make a cleanup pass on this patch and try to get it > pushed within a few

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > On 2019-05-02 12:59:55 -0400, Tom Lane wrote: >> One interesting thing that turns up in check-world is that if wal_level >> is minimal, we have to manually force an XID to be assigned, else >> reindexing pg_class fails with "cannot commit a transaction that deleted >>

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi, On 2019-05-02 12:59:55 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2019-05-02 11:41:28 -0400, Tom Lane wrote: > >>> But don't we need to worry about resetting relfrozenxid? > > >> Indexes don't have that though? We couldn't do it for pg_class itself, > >> but that's

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2019-05-02 11:41:28 -0400, Tom Lane wrote: >>> But don't we need to worry about resetting relfrozenxid? >> Indexes don't have that though? We couldn't do it for pg_class itself, >> but that's not a problem here. > Hmm. Again, that seems like the sort of

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > On 2019-05-02 11:41:28 -0400, Tom Lane wrote: >> But don't we need to worry about resetting relfrozenxid? > Indexes don't have that though? We couldn't do it for pg_class itself, > but that's not a problem here. Hmm. Again, that seems like the sort of assumption that

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi, On 2019-05-02 11:41:28 -0400, Tom Lane wrote: > Andres Freund writes: > > ISTM that if we go down this path, we should split (not now, but either > > still in v12, or *early* in v13), the sets of indexes that are intended > > to a) not being used for catalog queries b) may be skipped for

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > ISTM that if we go down this path, we should split (not now, but either > still in v12, or *early* in v13), the sets of indexes that are intended > to a) not being used for catalog queries b) may be skipped for index > insertions. It seems pretty likely that somebody will

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi, On 2019-05-02 10:49:00 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-05-01 22:01:53 -0400, Tom Lane wrote: > >> I think that argument is pretty pointless considering that "REINDEX TABLE > >> pg_class" does it this way, and that code is nearly old enough to > >> vote. > > > IMO

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > On 2019-05-01 00:43:34 -0400, Tom Lane wrote: >> ... What it's really currently doing at the >> moment of the deadlock is cleaning out its temporary schema after the >> client disconnected. > I'm inclined to remove the tests from the backbranches, once we've > committed

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Andres Freund
Hi, On 2019-05-01 00:43:34 -0400, Tom Lane wrote: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2019-05-01%2003%3A12%3A13 > > The relevant bit of log is > > 2019-05-01 05:24:47.527 CEST [97690:429] pg_regress/create_view LOG: > statement: DROP SCHEMA temp_view_test CASCADE;

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-02 Thread Tom Lane
Andres Freund writes: > On 2019-05-01 22:01:53 -0400, Tom Lane wrote: >> I think that argument is pretty pointless considering that "REINDEX TABLE >> pg_class" does it this way, and that code is nearly old enough to >> vote. > IMO the reindex_relation() case isn't comparable. IMV it's the exact

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Andres Freund
Hi, On 2019-05-01 22:01:53 -0400, Tom Lane wrote: > Andres Freund writes: > > Well, as I said before, I think hiding the to-be-rebuilt index from the > > list of indexes is dangerous too - if somebody added an actual > > CatalogUpdate/Insert (rather than inplace_update) anywhere along the > >

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Tom Lane
Andres Freund writes: > On 2019-05-01 19:41:24 -0400, Tom Lane wrote: >> OK, so per the other thread, it seems like the error recovery problem >> isn't going to affect this directly. However, I still don't like this >> proposal much; the reason being that it's a rather fundamental change >> in

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Andres Freund
Hi, On 2019-05-01 19:41:24 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2019-05-01 10:06:03 -0700, Andres Freund wrote: > >>> I'm not sure this is the right short-term answer. Why isn't it, for now, > >>> sufficient to do what I suggested with RelationSetNewRelfilenode()

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2019-05-01 10:06:03 -0700, Andres Freund wrote: >>> I'm not sure this is the right short-term answer. Why isn't it, for now, >>> sufficient to do what I suggested with RelationSetNewRelfilenode() not >>> doing the CommandCounterIncrement(), and

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Andres Freund
Hi, On 2019-05-01 10:21:15 -0700, Andres Freund wrote: > FWIW, the dirty-hack version (attached) of the CommandCounterIncrement() > approach fixes the issue for a REINDEX pg_class_oid_index; in solation > even when using CCA. Started a whole CCA testrun with it, but the > results of that will

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Tom Lane
Andres Freund writes: > On 2019-05-01 10:06:03 -0700, Andres Freund wrote: >> I'm not sure this is the right short-term answer. Why isn't it, for now, >> sufficient to do what I suggested with RelationSetNewRelfilenode() not >> doing the CommandCounterIncrement(), and reindex_index() then doing

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Andres Freund
On 2019-05-01 10:06:03 -0700, Andres Freund wrote: > I'm not sure this is the right short-term answer. Why isn't it, for now, > sufficient to do what I suggested with RelationSetNewRelfilenode() not > doing the CommandCounterIncrement(), and reindex_index() then doing the > SetReindexProcessing()

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Andres Freund
Hi, On 2019-05-01 12:20:22 -0400, Tom Lane wrote: > Andres Freund writes: > > I see the bug. Turns out we need to figure out another way to solve the > > assertion triggered by doing catalog updates within > > RelationSetNewRelfilenode() - we can't just move the > > SetReindexProcessing() before

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Tom Lane
I wrote: > Did you figure out why this doesn't also happen in HEAD? ... actually, HEAD *is* broken with CCA, just differently. I'm on it. regards, tom lane

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-05-01 Thread Tom Lane
Andres Freund writes: > I see the bug. Turns out we need to figure out another way to solve the > assertion triggered by doing catalog updates within > RelationSetNewRelfilenode() - we can't just move the > SetReindexProcessing() before it. When CCA is enabled, the > CommandCounterIncrement()

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
Andres Freund writes: > On 2019-04-30 15:53:07 -0700, Andres Freund wrote: >> I'll move the test into a new "reindex_catalog" test, with a comment >> explaining that the failure cases necessitating that are somewhere >> between bugs, ugly warts, an hard to fix edge cases. > Just pushed that.

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
On 2019-04-30 15:53:07 -0700, Andres Freund wrote: > Hi, > > On 2019-04-30 18:42:36 -0400, Tom Lane wrote: > > markhor just reported in with results showing that we have worse > > problems than deadlock-prone tests in the back branches: 9.4 > > for example looks like > > > -- whole tables > >

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
Hi, On 2019-04-30 18:42:36 -0400, Tom Lane wrote: > markhor just reported in with results showing that we have worse > problems than deadlock-prone tests in the back branches: 9.4 > for example looks like > -- whole tables > REINDEX TABLE pg_class; -- mapped, non-shared, critical > + ERROR:

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
Just when you thought it was safe to go back in the water ... markhor just reported in with results showing that we have worse problems than deadlock-prone tests in the back branches: 9.4 for example looks like -- -- whole tables REINDEX TABLE pg_class; -- mapped, non-shared, critical +

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
Andres Freund writes: > I'm not wild to go for a separate TAP test. A separate initdb cycle for > a a tests that takes about 30ms seems a bit over the top. Fair enough. > So I'm > inclined to either try running it in a serial step on the buildfarm > (survived a few dozen cycles with

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
Hi, On 2019-04-30 15:11:43 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-04-30 14:41:00 -0400, Tom Lane wrote: > > > On 2019-04-30 12:03:08 -0700, Andres Freund wrote: > > > > This is a pretty finnicky area of the code, with obviously not enough > > > > test coverage. I'm inclined

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
Andres Freund writes: > On 2019-04-30 14:41:00 -0400, Tom Lane wrote: >> I think trying to get this "working" is a v13 task now. We've obviously >> never tried to stress the case before, so you're neither fixing a >> regression nor fixing a new-in-v12 issue. > Well, the test *do* test that a

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
Hi, On 2019-04-30 14:41:00 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-04-30 14:05:50 -0400, Tom Lane wrote: > >> Possibly we could run them in a TAP test that configures a cluster > >> with autovac disabled? > > > Hm. Would it be sufficient to instead move them to a

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
Hi, On 2019-04-30 14:05:50 -0400, Tom Lane wrote: > Andres Freund writes: > > It's the lock-upgrade problem I theorized about > > upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a > > ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock > > in

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
Andres Freund writes: > It's the lock-upgrade problem I theorized about > upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a > ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock > in RelationSetNewRelfilenode(). But at that time another session >

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
On 2019-04-30 11:51:10 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-04-30 00:50:20 -0400, Tom Lane wrote: > > I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX > > over catalog tables modified during reindex. > > So far, every one of the failures in the

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
I wrote: > I haven't been able to reproduce this locally yet, but my guess is that > the REINDEX wants to update some row that was already updated by the > concurrent transaction, so it has to wait to see if the latter commits > or not. And, of course, waiting while holding AccessExclusiveLock on

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Tom Lane
Andres Freund writes: > On 2019-04-30 00:50:20 -0400, Tom Lane wrote: >> Hm? REINDEX INDEX is deadlock-prone by definition, because it starts >> by opening/locking the index and then it has to open/lock the index's >> table. Every other operation locks tables before their indexes. > We claim

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-30 Thread Andres Freund
Hi, On 2019-04-30 00:50:20 -0400, Tom Lane wrote: > Andres Freund writes: > > On April 29, 2019 9:37:33 PM PDT, Tom Lane wrote: > >> Seems like putting reindexes of pg_class into a test script that runs > >> in parallel with other DDL wasn't a hot idea. > > > Saw that. Will try to reproduce

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-29 Thread Tom Lane
Andres Freund writes: > On April 29, 2019 9:37:33 PM PDT, Tom Lane wrote: >> Seems like putting reindexes of pg_class into a test script that runs >> in parallel with other DDL wasn't a hot idea. > Saw that. Will try to reproduce (and if necessary either run separately or > revert). But isn't

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-29 Thread Andres Freund
Hi, On April 29, 2019 9:37:33 PM PDT, Tom Lane wrote: >Andres Freund writes: >> I've pushed the master bits, and the other branches are running >> check-world right now and I'll push soon unless something breaks >(it's a >> bit annoying that <= 9.6 can't run check-world in parallel...). >

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-29 Thread Tom Lane
Andres Freund writes: > I've pushed the master bits, and the other branches are running > check-world right now and I'll push soon unless something breaks (it's a > bit annoying that <= 9.6 can't run check-world in parallel...). Seems like putting reindexes of pg_class into a test script that

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-29 Thread Andres Freund
Hi, On 2019-04-29 15:09:24 -0700, Andres Freund wrote: > On 2019-04-29 18:07:07 -0400, Tom Lane wrote: > > I wrote: > > > Andres Freund writes: > > >> Taking this as a WIP, what do you think? > > > > > Seems generally about right. > > > > Andres, are you pushing this forward? Next week's

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-29 Thread Andres Freund
Hi, On 2019-04-29 18:07:07 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> Taking this as a WIP, what do you think? > > > Seems generally about right. > > Andres, are you pushing this forward? Next week's minor releases > are coming up fast, and we're going to need to adapt

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-29 Thread Tom Lane
I wrote: > Andres Freund writes: >> Taking this as a WIP, what do you think? > Seems generally about right. Andres, are you pushing this forward? Next week's minor releases are coming up fast, and we're going to need to adapt the HEAD patch significantly for the back branches AFAICS. So

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-26 Thread Tom Lane
Andres Freund writes: > On 2019-04-25 17:12:33 -0400, Tom Lane wrote: >> Andres Freund writes: >>> I was wondering if we should just pass in the pg_class tuple as an "out" >>> argument, instead of pointers to relfilnode/relfrozenxid/relminmxid. >> Yeah, possibly. The whole business with xids

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Michael Paquier
On Thu, Apr 25, 2019 at 11:32:21AM -0400, Tom Lane wrote: > What you have to do to get it to crash is to ensure that > RelationSetNewRelfilenode's update of pg_class will be a non-HOT > update. You can try to set that up with "vacuum full pg_class" > but it turns out that that tends to leave the

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Andres Freund
Hi, On 2019-04-25 17:12:33 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2019-04-25 16:02:03 -0400, Tom Lane wrote: > >> That could work. The important API spec is then that the relcache entry > >> reflects the *previous* state of the relation, and is not to be modified > >> by the

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Andres Freund
Hi, On 2019-04-25 17:12:33 -0400, Tom Lane wrote: > Andres Freund writes: > > My point was that given the current coding the code in > > ATExecSetTableSpace() would make changes to the *old* relfilenode, after > > having already copied the contents to the new relfilenode. Which means > > that if

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Tom Lane
Andres Freund writes: > On 2019-04-25 16:02:03 -0400, Tom Lane wrote: >> That could work. The important API spec is then that the relcache entry >> reflects the *previous* state of the relation, and is not to be modified >> by the tableam call. > Right. > I was wondering if we should just pass

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Andres Freund
Hi, On 2019-04-25 16:02:03 -0400, Tom Lane wrote: > > Currently the only thing that table_relation_set_new_filenode() accesses > > that already is updated is the RelFileNode. I wonder if we shouldn't > > change the API so that table_relation_set_new_filenode() will get a > > relcache entry

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Tom Lane
Andres Freund writes: > On 2019-04-25 14:50:09 -0400, Tom Lane wrote: >> As far as I can see, there is no API restriction on what parts of the >> relcache entry it may presume are valid. It *certainly* thinks that >> rd_rel is valid, which is rather at odds with the fact that this has >> to be

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Andres Freund
Hi, On 2019-04-25 12:29:16 -0400, Tom Lane wrote: > I wrote: > > The problem here is that RelationSetNewRelfilenode is aggressively > > changing the index's relcache entry before it's written out the > > updated tuple, so that the tuple update tries to make an index > > entry in the new storage

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Tom Lane
I wrote: > The problem here is that RelationSetNewRelfilenode is aggressively > changing the index's relcache entry before it's written out the > updated tuple, so that the tuple update tries to make an index > entry in the new storage which isn't filled yet. I think we can > fix it by *not*

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Tom Lane
Michael Paquier writes: > On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote: >> Oh! One gets you ten it "works" as long as the pg_class update is a >> HOT update, so that we don't actually end up touching the indexes. > I have been able to spend a bit more time testing and looking at the

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Alvaro Herrera
On 2019-Apr-25, Michael Paquier wrote: > 2) Bisecting between the merge base points of REL9_4_STABLE/master and > REL9_5_STABLE/master, I am being pointed to the introduction of > replication origins: > commit: 5aa2350426c4fdb3d04568b65aadac397012bbcb > author: Andres Freund > date: Wed, 29 Apr

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Michael Paquier
On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote: > Oh! One gets you ten it "works" as long as the pg_class update is a > HOT update, so that we don't actually end up touching the indexes. > This explains why the crash is less likely to happen in a database > where one's done some work

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-25 Thread Shaoqi Bai
On Wed, Apr 24, 2019 at 7:55 AM Tom Lane wrote: > Michael Paquier writes: > > On Tue, Apr 23, 2019 at 04:47:19PM -0400, Tom Lane wrote: > >> Is there some precondition you're not mentioning? > > > Hm. In my own init scripts, I create a new database just after > > starting the instance. > > Ah,

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-23 Thread Michael Paquier
On Tue, Apr 23, 2019 at 07:54:52PM -0400, Tom Lane wrote: > That seems like pretty much of a hack :-(, in particular I'm not > convinced that we'd not end up with a missing index entry afterwards. > Maybe it's the only way, but I think first we need to trace down > exactly why this broke. I

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-23 Thread Tom Lane
I wrote: > It also seems quite odd that it doesn't fail every time; surely it's > not conditional whether we'll try to insert a new pg_class tuple or not? > We need to understand that, too. Oh! One gets you ten it "works" as long as the pg_class update is a HOT update, so that we don't actually

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-23 Thread Tom Lane
Michael Paquier writes: > On Tue, Apr 23, 2019 at 04:47:19PM -0400, Tom Lane wrote: >> Is there some precondition you're not mentioning? > Hm. In my own init scripts, I create a new database just after > starting the instance. Ah, there we go: regression=# create database d1; CREATE DATABASE

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-23 Thread Michael Paquier
On Tue, Apr 23, 2019 at 04:47:19PM -0400, Tom Lane wrote: > regression=# reindex index pg_class_relname_nsp_index; > REINDEX > regression=# reindex index pg_class_oid_index; > REINDEX > regression=# reindex index pg_class_tblspc_relfilenode_index; > REINDEX > regression=# reindex table pg_class;

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-23 Thread Alvaro Herrera
On 2019-Apr-18, Michael Paquier wrote: > Fujii-san has sent me a report offline about REINDEX. And since 9.6, > trying to REINDEX directly an index of pg_class fails lamentably on an > assertion failure (mbsync failure if bypassing the assert): Hmm, yeah, I ran into this crash too, more than a

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-23 Thread Tom Lane
Michael Paquier writes: > Fujii-san has sent me a report offline about REINDEX. And since 9.6, > trying to REINDEX directly an index of pg_class fails lamentably on an > assertion failure (mbsync failure if bypassing the assert): So ... I can't reproduce this on HEAD. Nor the back branches.

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

2019-04-22 Thread Michael Paquier
On Thu, Apr 18, 2019 at 10:14:30AM +0900, Michael Paquier wrote: > Doing a REINDEX TABLE directly on pg_class proves to work correctly, > and CONCURRENTLY is not supported for catalog tables. > > Bisecting my way through it, the first commit causing the breakage is > that: > commit: