On 2018-08-08 01:23:51 +1200, David Rowley wrote: > On 8 August 2018 at 00:47, Andres Freund <and...@anarazel.de> wrote: > > On 2018-08-08 00:40:12 +1200, David Rowley wrote: > >> 1. Obtain a ShareUpdateExclusiveLock on the partitioned table rather > >> than an AccessExclusiveLock. > >> 2. Do all the normal partition attach partition validation. > >> 3. Insert pg_partition record with partvalid = true. > >> 4. Invalidate relcache entry for the partitioned table > >> 5. Any loops over a partitioned table's PartitionDesc must check > >> PartitionIsValid(). This will return true if the current snapshot > >> should see the partition or not. The partition is valid if partisvalid > >> = true and the xmin precedes or is equal to the current snapshot. > > > > How does this protect against other sessions actively using the relcache > > entry? Currently it is *NOT* safe to receive invalidations for > > e.g. partitioning contents afaics. > > I'm not proposing that sessions running older snapshots can't see that > there's a new partition. The code I have uses PartitionIsValid() to > test if the partition should be visible to the snapshot. The > PartitionDesc will always contain details for all partitions stored in > pg_partition whether they're valid to the current snapshot or not. I > did it this way as there's no way to invalidate the relcache based on > a point in transaction, only a point in time.
I don't think that solves the problem that an arriving relcache invalidation would trigger a rebuild of rd_partdesc, while it actually is referenced by running code. You'd need to build infrastructure to prevent that. One approach would be to make sure that everything relying on rt_partdesc staying the same stores its value in a local variable, and then *not* free the old version of rt_partdesc (etc) when the refcount > 0, but delay that to the RelationClose() that makes refcount reach 0. That'd be the start of a framework for more such concurrenct handling. Regards, Andres Freund