On Tue, Nov 6, 2018 at 5:09 PM Robert Haas <robertmh...@gmail.com> wrote: > - get_partition_dispatch_recurse and ExecCreatePartitionPruneState > both call RelationGetPartitionDesc. Presumably, this means that if > the partition descriptor gets updated on the fly, the tuple routing > and partition dispatch code could end up with different ideas about > which partitions exist. I think this should be fixed somehow, so that > we only call RelationGetPartitionDesc once per query and use the > result for everything.
I think there is deeper trouble here. ExecSetupPartitionTupleRouting() calls find_all_inheritors() to acquire RowExclusiveLock on the whole partitioning hierarchy. It then calls RelationGetPartitionDispatchInfo (as a non-relcache function, this seems poorly named) which calls get_partition_dispatch_recurse, which does this: /* * We assume all tables in the partition tree were already locked * by the caller. */ Relation partrel = heap_open(partrelid, NoLock); That seems OK at present, because no new partitions can have appeared since ExecSetupPartitionTupleRouting() acquired locks. But if we allow new partitions to be added with only ShareUpdateExclusiveLock, then I think there would be a problem. If a new partition OID creeps into the partition descriptor after find_all_inheritors() and before we fetch its partition descriptor, then we wouldn't have previously taken a lock on it and would still be attempting to open it without a lock, which is bad (cf. b04aeb0a053e7cf7faad89f7d47844d8ba0dc839). Admittedly, it might be a bit hard to provoke a failure here because I'm not exactly sure how you could trigger a relcache reload in the critical window, but I don't think we should rely on that. More generally, it seems a bit strange that we take the approach of locking the entire partitioning hierarchy here regardless of which relations the query actually knows about. If some relations have been pruned, presumably we don't need to lock them; if/when we permit concurrent partition, we don't need to lock any new ones that have materialized. We're just going to end up ignoring them anyway because there's nothing to do with the information that they are or are not excluded from the query when they don't appear in the query plan in the first place. Furthermore, this whole thing looks suspiciously like more of the sort of redundant locking that f2343653f5b2aecfc759f36dbb3fd2a61f36853e attempted to eliminate. In light of that commit message, I'm wondering whether the best approach would be to [1] get rid of the find_all_inheritors call altogether and [2] somehow ensure that get_partition_dispatch_recurse() doesn't open any tables that aren't part of the query's range table. Thoughts? Comments? Ideas? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company