Hi, On 2019-04-25 16:02:03 -0400, Tom Lane wrote: > > Currently the only thing that table_relation_set_new_filenode() accesses > > that already is updated is the RelFileNode. I wonder if we shouldn't > > change the API so that table_relation_set_new_filenode() will get a > > relcache entry *without* any updates passed in, then internally does > > GetNewRelFileNode() (if so desired by the AM), and returns the new rnode > > via a new out parameter. > > That could work. The important API spec is then that the relcache entry > reflects the *previous* state of the relation, and is not to be modified > by the tableam call.
Right. I was wondering if we should just pass in the pg_class tuple as an "out" argument, instead of pointers to relfilnode/relfrozenxid/relminmxid. > > We don't currently allow that, but as far as I can see the current > > coding of ATExecSetTableSpace() also has bad problems with system > > catalog updates. It copies the data and *then* does > > CatalogTupleUpdate(), but *witout* updating the reclache - which ijust > > would cause the update to be lost. > > Well, I imagine it's expecting the subsequent CCI to update the relcache > entry, which I think is correct behavior in this worldview. We're > basically trying to make the relcache state follow transaction/command > boundary semantics. My point was that given the current coding the code in ATExecSetTableSpace() would make changes to the *old* relfilenode, after having already copied the contents to the new relfilenode. Which means that if ATExecSetTableSpace() is ever used on pg_class or one of it's indexes, it'd just loose those changes, afaict. > > I could come up with a patch for that if you want me to. > > I'm happy to let you take a whack at it if you want. I'll give it a whack (after writing one more email on a separate but loosely related topic). Greetings, Andres Freund