On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote: > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote: >> >> I recall I had proposed a fix for the same thing some time ago . > > Wow. About 1.5 years ago and stuck? I have a concrete case where > this harms so I'd like to fix that this time. How can we move on?
Agreed that this should be fixed. Your proposed approach #1 to recheck the inheritance after obtaining the lock on the child table seems to be a good way forward. Approach #2 of reordering locking is a simpler solution, but does not completely prevent the problem, because DROP TABLE child can still cause it to occur, as you mentioned. >>> The cause is that NO INHERIT doesn't take an exlusive lock on the >>> parent. This allows expand_inherited_rtentry to add the child >>> relation into appendrel after removal from the inheritance but >>> still exists. >> >> Right. >> >>> I see two ways to fix this. >>> >>> The first patch adds a recheck of inheritance relationship if the >>> corresponding attribute is missing in the child in >>> make_inh_translation_list(). The recheck is a bit complex but it >>> is not performed unless the sequence above is happen. It checks >>> duplication of relid (or cycles in inheritance) following >>> find_all_inheritors (but doing a bit different) but I'm not sure >>> it is really useful. >> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but >> I guess your hash table based solution will do the job as far as >> performance of this check is concerned, although I haven't checked the >> code closely. > > The hash table is not crucial in the patch. Substantially it just > recurs down looking up pg_inherits for the child. I considered > the new index but abandoned because I thought that such case > won't occur so frequently. Agreed. BTW, the hash table in patch #1 does not seem to be really helpful. In the while loop in is_descendant_of_internal(), does hash_search() ever return found = true? AFAICS, it does not. >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on >>> the parent first. >>> >>> Since the latter has a larger impact on the current behavior and >>> we already treat "DROP TABLE child" case in the similar way, I >>> suppose that the first approach would be preferable. >> >> That makes sense. >> >> BTW, in the partitioned table case, the parent is always locked first >> using an AccessExclusiveLock. There are other considerations in that case >> such as needing to recreate the partition descriptor upon termination of >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > Apart from the degree of concurrency, if we keep parent->children > order of locking, such recreation does not seem to be > needed. Maybe I'm missing something. Sorry to have introduced that topic in this thread, but I will try to explain anyway why things are the way they are currently: Once a table is no longer a partition of the parent (detached or dropped), we must make sure that the next commands in the transaction don't see it as one. That information is currently present in the relcache (rd_partdesc), which is used by a few callers, most notably the tuple-routing code. Next commands must recreate the entry so that the correct thing happens based on the updated information. More precisely, we must invalidate the current entry. RelationClearRelation() will either delete the entry or rebuild it. If it's being referenced somewhere, it will be rebuilt. The place holding the reference may also be looking at the content of rd_partdesc, which we don't require them to make a copy of, so we must preserve its content while other fields of RelationData are being read anew from the catalog. We don't have to preserve it if there has been any change (partition added/dropped), but to make such a change one would need to take a strong enough lock on the relation (parent). We assume here that anyone who wants to reference rd_partdesc takes at least AccessShareLock lock on the relation, and anyone who wants to change its content must take a lock that will conflict with it, so AccessExclusiveLock. Note that in all of this, we are only talking about one relation, that is the parent, so parent -> child ordering of taking locks may be irrelevant. Thanks, Amit -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers