On Tue, Jan 29, 2019 at 1:59 PM Robert Haas <robertmh...@gmail.com> wrote: > I don't know how to reduce this to something reliable enough to > include it in the regression tests, and maybe we don't really need > that, but it's good to know that this is not a purely theoretical > problem. I think next I'll try to write some code to make > execPartition.c able to cope with the situation when it arises.
OK, that seems to be pretty easy. New patch series attached. The patch with that new logic is 0004. I've consolidated some of the things I had as separate patches in my last post and rewritten the commit messages to explain more clearly the purpose of each patch. Open issues: - For now, I haven't tried to handle the DETACH PARTITION case. I don't think there's anything preventing someone - possibly even me - from implementing the counter-based approach that I described in the previous message, but I think it would be good to have some more discussion first on whether it's acceptable to make concurrent queries error out. I think any queries that were already up and running would be fine, but any that were planned before the DETACH and tried to execute afterwards would get an ERROR. That's fairly low-probability, because normally the shared invalidation machinery would cause replanning, but there's a race condition, so we'd have to document something like: if you use this feature, it'll probably just work, but you might get some funny errors in other sessions if you're unlucky. That kinda sucks but maybe we should just suck it up. Possibly we should consider making the concurrent behavior optional, so that if you'd rather take blocking locks than risk errors, you have that option. Of course I guess you could also just let people do an explicit LOCK TABLE if that's what they want. Or we could try to actually make it work in that case, I guess by ignoring the detached partitions, but that seems a lot harder. - 0003 doesn't have any handling for parallel query at this point, so even though within a single backend a single query execution will always get the same PartitionDesc for the same relation, the answers might not be consistent across the parallel group. I keep going back and forth on whether this really matters. It's too late to modify the plan, so any relations attached since it was generated are not going to get scanned. As for detached relations, we're talking about making them error out, so we don't have to worry about different backends come to different conclusions about whether they should be scanned. But maybe we should try to be smarter instead. One concern is that even relations that aren't scanned could still be affected because of tuple routing, but right now parallel queries can't INSERT or UPDATE rows anyway. Then again, maybe we should try not to add more obstacles in the way of lifting that restriction. Then again again, AFAICT we wouldn't be able to test that the new code is actually solving a problem right now today, and how much untested code do we really want in the tree? And then on the eleventh hand, maybe there are other reasons why it's important to use the same PartitionDesc across all parallel workers that I'm not thinking about at the moment. - 0003 also changes the order in which locks are acquired. I am not sure whether we care about this, especially in view of other pending changes. If you know of other problems, have solutions to or opinions about these, or think the whole approach is wrong, please speak up! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
0002-Ensure-that-RelationBuildPartitionDesc-sees-a-consis.patch
Description: Binary data
0001-Move-code-for-managing-PartitionDescs-into-a-new-fil.patch
Description: Binary data
0005-Reduce-the-lock-level-required-to-attach-a-partition.patch
Description: Binary data
0004-Teach-runtime-partition-pruning-to-cope-with-concurr.patch
Description: Binary data
0003-Ensure-that-repeated-PartitionDesc-lookups-return-th.patch
Description: Binary data