Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
At Fri, 15 Sep 2017 15:36:26 +0900, Amit Langote wrote in > Hi. > > On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote: > > << the following is another topic >> > > > BTW, in the partitioned table case, the parent is always locked first > using an AccessExclusiveLock. There are other considerations in that > case > such as needing to recreate the partition descriptor upon termination of > inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > >>> > >>> Apart from the degree of concurrency, if we keep parent->children > >>> order of locking, such recreation does not seem to be > >>> needed. Maybe I'm missing something. > >> > >> Sorry to have introduced that topic in this thread, but I will try to > >> explain anyway why things are the way they are currently: > >> > >> Once a table is no longer a partition of the parent (detached or dropped), > >> we must make sure that the next commands in the transaction don't see it > >> as one. That information is currently present in the relcache > >> (rd_partdesc), which is used by a few callers, most notably the > >> tuple-routing code. Next commands must recreate the entry so that the > >> correct thing happens based on the updated information. More precisely, > >> we must invalidate the current entry. RelationClearRelation() will either > >> delete the entry or rebuild it. If it's being referenced somewhere, it > >> will be rebuilt. The place holding the reference may also be looking at > >> the content of rd_partdesc, which we don't require them to make a copy of, > >> so we must preserve its content while other fields of RelationData are > >> being read anew from the catalog. We don't have to preserve it if there > >> has been any change (partition added/dropped), but to make such a change > >> one would need to take a strong enough lock on the relation (parent). We > >> assume here that anyone who wants to reference rd_partdesc takes at least > >> AccessShareLock lock on the relation, and anyone who wants to change its > >> content must take a lock that will conflict with it, so > >> AccessExclusiveLock. Note that in all of this, we are only talking about > >> one relation, that is the parent, so parent -> child ordering of taking > >> locks may be irrelevant. > > > > I think I understand this, anyway DropInherit and DropPartition > > is different-but-at-the-same-level operations so surely needs > > amendment for drop/detach cases. Is there already a solution? Or > > reproducing steps? > > Sorry, I think I forgot to reply to this. Since you seem to have chosen > the other solution (checking that child is still a child), maybe this > reply is a bit too late, but anyway. I choosed it at that time for the reason mentioned upthread, but haven't decided which is better. > DropInherit or NO INHERIT is seen primarily as changing a child table's > (which is the target table of the command) property that it is no longer a > child of the parent, so we lock the child table to block concurrent > operations from considering it a child of parent anymore. The fact that > parent is locked after the child and with ShareUpdateExclusiveLock instead > of AccessExclusiveLock, we observe this race condition when SELECTing from > the parent. > > DropPartition or DETACH PARTITION is seen primarily as changing the parent > table's (which is the target table of the command) property that one of > the partitions is removed, so we lock the parent. Any concurrent > operations that rely on the parent's relcache to get the partition list > will wait for the session that is dropping the partition to finish, so > that they get the fresh information from the relcache (or more importantly > do not end up with information obtained from the relcache going invalid > under them without notice). Note that the lock on the partition/child is > also present and it plays more or less the the same role as it does in the > DropInherit case, but due to different order of locking, reported race > condition does not occur between SELECT on partitioned table and > DROP/DETACH PARTITION. Thank you for the explanation. I understand that the difference comes from which of parent and children has the information about inheritance/partitioning. DROP child and ALTER child NO INHERITS are (I think) the only two operations that intiated from children side. The parent-locking patch results in different solutions for similar problems. However, parent locking on DROPped child requires a new index InheritsRelidIndex to avoid full scan on pg_inherits, but the NO INHERITS case doesn't. This might be a good reason for the difference. I noticed that DROP TABLE partition acquires lock on the parent in a callback for RangeVarGetRelid(Extended). The same way seems reasonable for the NO INHERIT case. As the result the patch becomes far small and less invasive than the previous one. (dropinh_lock_parent_v2.patch) As a negative PoC, attacched dropchild_lock_parent
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
On 2017/09/15 15:36, Amit Langote wrote: > The fact that > parent is locked after the child and with ShareUpdateExclusiveLock instead > of AccessExclusiveLock, we observe this race condition when SELECTing from > the parent. Oops, I meant "parent table is locked with AccessShareLock instead of AccessExclusiveLock..." Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Hi. On 2017/08/28 18:28, Kyotaro HORIGUCHI wrote: > << the following is another topic >> > BTW, in the partitioned table case, the parent is always locked first using an AccessExclusiveLock. There are other considerations in that case such as needing to recreate the partition descriptor upon termination of inheritance (both the DETACH PARTITION and also DROP TABLE child cases). >>> >>> Apart from the degree of concurrency, if we keep parent->children >>> order of locking, such recreation does not seem to be >>> needed. Maybe I'm missing something. >> >> Sorry to have introduced that topic in this thread, but I will try to >> explain anyway why things are the way they are currently: >> >> Once a table is no longer a partition of the parent (detached or dropped), >> we must make sure that the next commands in the transaction don't see it >> as one. That information is currently present in the relcache >> (rd_partdesc), which is used by a few callers, most notably the >> tuple-routing code. Next commands must recreate the entry so that the >> correct thing happens based on the updated information. More precisely, >> we must invalidate the current entry. RelationClearRelation() will either >> delete the entry or rebuild it. If it's being referenced somewhere, it >> will be rebuilt. The place holding the reference may also be looking at >> the content of rd_partdesc, which we don't require them to make a copy of, >> so we must preserve its content while other fields of RelationData are >> being read anew from the catalog. We don't have to preserve it if there >> has been any change (partition added/dropped), but to make such a change >> one would need to take a strong enough lock on the relation (parent). We >> assume here that anyone who wants to reference rd_partdesc takes at least >> AccessShareLock lock on the relation, and anyone who wants to change its >> content must take a lock that will conflict with it, so >> AccessExclusiveLock. Note that in all of this, we are only talking about >> one relation, that is the parent, so parent -> child ordering of taking >> locks may be irrelevant. > > I think I understand this, anyway DropInherit and DropPartition > is different-but-at-the-same-level operations so surely needs > amendment for drop/detach cases. Is there already a solution? Or > reproducing steps? Sorry, I think I forgot to reply to this. Since you seem to have chosen the other solution (checking that child is still a child), maybe this reply is a bit too late, but anyway. DropInherit or NO INHERIT is seen primarily as changing a child table's (which is the target table of the command) property that it is no longer a child of the parent, so we lock the child table to block concurrent operations from considering it a child of parent anymore. The fact that parent is locked after the child and with ShareUpdateExclusiveLock instead of AccessExclusiveLock, we observe this race condition when SELECTing from the parent. DropPartition or DETACH PARTITION is seen primarily as changing the parent table's (which is the target table of the command) property that one of the partitions is removed, so we lock the parent. Any concurrent operations that rely on the parent's relcache to get the partition list will wait for the session that is dropping the partition to finish, so that they get the fresh information from the relcache (or more importantly do not end up with information obtained from the relcache going invalid under them without notice). Note that the lock on the partition/child is also present and it plays more or less the the same role as it does in the DropInherit case, but due to different order of locking, reported race condition does not occur between SELECT on partitioned table and DROP/DETACH PARTITION. By the way, I will take a look at your patch when I come back from the vacation. Meanwhile, I noticed that it needs another rebase after 0a480502b092 [1]. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b092 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
At Wed, 13 Sep 2017 20:20:48 -0400, Robert Haas wrote in > On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI > wrote: > > The cause is that NO INHERIT doesn't take an exlusive lock on the > > parent. This allows expand_inherited_rtentry to add the child > > relation into appendrel after removal from the inheritance but > > still exists. > > > > I see two ways to fix this. > > > > The first patch adds a recheck of inheritance relationship if the > > corresponding attribute is missing in the child in > > make_inh_translation_list(). The recheck is a bit complex but it > > is not performed unless the sequence above is happen. It checks > > duplication of relid (or cycles in inheritance) following > > find_all_inheritors (but doing a bit different) but I'm not sure > > it is really useful. > > > > > > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > > the parent first. > > > > Since the latter has a larger impact on the current behavior and > > we already treat "DROP TABLE child" case in the similar way, I > > suppose that the first approach would be preferable. > > > > > > Any comments or thoughts? > > I agree that the second has less user impact, but I wonder if we can > think that will really fix the bug completely, or more generally if it > will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not > locking the parent. I have a sneaking suspicion that may be wishful > thinking. Thanks for the comment. The recheck patch prevent planner from involving just-detached children while inheritance expansion. No simultaneous detatching of children doesn't affect the planning before the time. Once planner (the select side) gets lock on the child, the alter side cannot do anything until the select finishes. If the alter side won, the select side detects detaching immediately after the lock is released then excludes the children. No problem will occur ever after. Even in the case a child is replaced with another table, it is nothing different from simple detaching. As the result, I think that the recheck patch saves all possible problem caused by simultaneously detached children. However, the parent-locking patch is far smaller and it doesn't need such an explanation on its perfectness. If another problem occurs by simlultaneous detaching, it must haven't taken necessary locks in the right order. The most significant reason for my decision on this ptach was the fact that the DROP child case have been resolved by rechecking without parent locks, which is a kind of waste of resources if it could be resolved without it. (And I saw that the same solution is taken at least several places) I don't think there's noticeable difference of behavior (excluding performance) for users between the two solutions. Is there anyone who has an opinion on the point? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
On Mon, Jun 26, 2017 at 4:46 AM, Kyotaro HORIGUCHI wrote: > The cause is that NO INHERIT doesn't take an exlusive lock on the > parent. This allows expand_inherited_rtentry to add the child > relation into appendrel after removal from the inheritance but > still exists. > > I see two ways to fix this. > > The first patch adds a recheck of inheritance relationship if the > corresponding attribute is missing in the child in > make_inh_translation_list(). The recheck is a bit complex but it > is not performed unless the sequence above is happen. It checks > duplication of relid (or cycles in inheritance) following > find_all_inheritors (but doing a bit different) but I'm not sure > it is really useful. > > > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > the parent first. > > Since the latter has a larger impact on the current behavior and > we already treat "DROP TABLE child" case in the similar way, I > suppose that the first approach would be preferable. > > > Any comments or thoughts? I agree that the second has less user impact, but I wonder if we can think that will really fix the bug completely, or more generally if it will fix all of the bugs that come from ALTER TABLE .. NO INHERIT not locking the parent. I have a sneaking suspicion that may be wishful thinking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
On 2017/09/13 12:05, Simon Riggs wrote: > On 26 June 2017 at 10:16, Amit Langote wrote: > >> BTW, in the partitioned table case, the parent is always locked first >> using an AccessExclusiveLock. There are other considerations in that case >> such as needing to recreate the partition descriptor upon termination of >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > Is this requirement documented or in comments anywhere? Yes. See the last sentence in the description of PARTITION OF clause in CREATE TABLE: https://www.postgresql.org/docs/devel/static/sql-createtable.html#sql-createtable-partition And, the 4th point in the list of differences between declarative partitioning and inheritance: https://www.postgresql.org/docs/devel/static/ddl-partitioning.html#ddl-partitioning-implementation-inheritance Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
On 26 June 2017 at 10:16, Amit Langote wrote: > BTW, in the partitioned table case, the parent is always locked first > using an AccessExclusiveLock. There are other considerations in that case > such as needing to recreate the partition descriptor upon termination of > inheritance (both the DETACH PARTITION and also DROP TABLE child cases). Is this requirement documented or in comments anywhere? I can't see anything about that, which is a fairly major usage point. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
At Mon, 28 Aug 2017 18:28:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20170828.182807.98097766.horiguchi.kyot...@lab.ntt.co.jp> > I'll add this to CF2017-09. This patch got deadly atack from the commit 30833ba. I changed the signature of expand_single_inheritance_child in addition to make_inh_translation_list to notify that the specified child is no longer a child of the parent. This passes regular regression test and fixed the the problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/backend/catalog/pg_inherits.c --- b/src/backend/catalog/pg_inherits.c *** *** 42,47 typedef struct SeenRelsEntry --- 42,49 ListCell *numparents_cell; /* corresponding list cell */ } SeenRelsEntry; + static bool is_descendent_of_internal(Oid parentId, Oid childId, + HTAB *seen_rels); /* * find_inheritance_children * *** *** 400,402 typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId) --- 402,472 return result; } + + /* + * Check if the child is really a descendent of the parent + */ + bool + is_descendent_of(Oid parentId, Oid childId) + { + HTAB *seen_rels; + HASHCTL ctl; + bool ischild = false; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(Oid); + ctl.hcxt = CurrentMemoryContext; + + seen_rels = hash_create("is_descendent_of temporary table", + 32, /* start small and extend */ + &ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + ischild = is_descendent_of_internal(parentId, childId, seen_rels); + + hash_destroy(seen_rels); + + return ischild; + } + + static bool + is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels) + { + Relation inhrel; + SysScanDesc scan; + ScanKeyData key[1]; + bool ischild = false; + HeapTuple inheritsTuple; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + ScanKeyInit(&key[0], Anum_pg_inherits_inhparent, + BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(parentId)); + scan = systable_beginscan(inhrel, InheritsParentIndexId, true, + NULL, 1, key); + + while ((inheritsTuple = systable_getnext(scan)) != NULL) + { + bool found; + Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; + + hash_search(seen_rels, &inhrelid, HASH_ENTER, &found); + + /* + * Recursively check into children. Although there can't theoretically + * be any cycles in the inheritance graph, check the cycles following + * find_all_inheritors. + */ + if (inhrelid == childId || + (!found && is_descendent_of_internal(inhrelid, childId, seen_rels))) + { + ischild = true; + break; + } + } + + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + + return ischild; + } *** a/src/backend/optimizer/prep/prepunion.c --- b/src/backend/optimizer/prep/prepunion.c *** *** 108,123 static void expand_partitioned_rtentry(PlannerInfo *root, LOCKMODE lockmode, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static void expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark *parentrc, Relation childrel, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static void make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno, ! List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, --- 108,122 LOCKMODE lockmode, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static bool expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, Index parentRTindex, Relation parentrel, PlanRowMark *parentrc, Relation childrel, bool *has_child, List **appinfos, List **partitioned_child_rels); ! static List *make_inh_translation_list(Relation oldrelation, Relation newrelation, ! Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, *** *** 1476,1481 expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) --- 1475,1482 * in which they appear in the PartitionDesc. But first, expand the * parent itself. */ + + /* ignore the return value since this doesn't exclude the parent */ expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc, oldrelation, &has_child, &appinfos, *** *** 1497,1502 expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) --- 1498,
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Hello, I'll add this to CF2017-09. At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote wrote in <75fe42df-b1d8-89ff-596d-d9da0749e...@lab.ntt.co.jp> > On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote: > > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote: > >> > >> I recall I had proposed a fix for the same thing some time ago [1]. > > > > Wow. About 1.5 years ago and stuck? I have a concrete case where > > this harms so I'd like to fix that this time. How can we move on? > > Agreed that this should be fixed. > > Your proposed approach #1 to recheck the inheritance after obtaining the > lock on the child table seems to be a good way forward. > > Approach #2 of reordering locking is a simpler solution, but does not > completely prevent the problem, because DROP TABLE child can still cause > it to occur, as you mentioned. > > >>> The cause is that NO INHERIT doesn't take an exlusive lock on the > >>> parent. This allows expand_inherited_rtentry to add the child > >>> relation into appendrel after removal from the inheritance but > >>> still exists. > >> > >> Right. > >> > >>> I see two ways to fix this. > >>> > >>> The first patch adds a recheck of inheritance relationship if the > >>> corresponding attribute is missing in the child in > >>> make_inh_translation_list(). The recheck is a bit complex but it > >>> is not performed unless the sequence above is happen. It checks > >>> duplication of relid (or cycles in inheritance) following > >>> find_all_inheritors (but doing a bit different) but I'm not sure > >>> it is really useful. > >> > >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but > >> I guess your hash table based solution will do the job as far as > >> performance of this check is concerned, although I haven't checked the > >> code closely. > > > > The hash table is not crucial in the patch. Substantially it just > > recurs down looking up pg_inherits for the child. I considered > > the new index but abandoned because I thought that such case > > won't occur so frequently. > > Agreed. BTW, the hash table in patch #1 does not seem to be really > helpful. In the while loop in is_descendant_of_internal(), does > hash_search() ever return found = true? AFAICS, it does not. > > >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on > >>> the parent first. > >>> > >>> Since the latter has a larger impact on the current behavior and > >>> we already treat "DROP TABLE child" case in the similar way, I > >>> suppose that the first approach would be preferable. > >> > >> That makes sense. So, I attached only the first patch, rebased on the current master (It actually failed to apply on it.) and fixed a typo in a comment. This still runs a closed-loop test using temporary hash but it seems a bit paranoic. (this is the same check with what find_all_inheritors is doing) << the following is another topic >> > >> BTW, in the partitioned table case, the parent is always locked first > >> using an AccessExclusiveLock. There are other considerations in that case > >> such as needing to recreate the partition descriptor upon termination of > >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > > > Apart from the degree of concurrency, if we keep parent->children > > order of locking, such recreation does not seem to be > > needed. Maybe I'm missing something. > > Sorry to have introduced that topic in this thread, but I will try to > explain anyway why things are the way they are currently: > > Once a table is no longer a partition of the parent (detached or dropped), > we must make sure that the next commands in the transaction don't see it > as one. That information is currently present in the relcache > (rd_partdesc), which is used by a few callers, most notably the > tuple-routing code. Next commands must recreate the entry so that the > correct thing happens based on the updated information. More precisely, > we must invalidate the current entry. RelationClearRelation() will either > delete the entry or rebuild it. If it's being referenced somewhere, it > will be rebuilt. The place holding the reference may also be looking at > the content of rd_partdesc, which we don't require them to make a copy of, > so we must preserve its content while other fields of RelationData are > being read anew from the catalog. We don't have to preserve it if there > has been any change (partition added/dropped), but to make such a change > one would need to take a strong enough lock on the relation (parent). We > assume here that anyone who wants to reference rd_partdesc takes at least > AccessShareLock lock on the relation, and anyone who wants to change its > content must take a lock that will conflict with it, so > AccessExclusiveLock. Note that in all of this, we are only talking about > one relation, that is the parent, so parent -> child ordering of taking > locks may be irrelevant. I think I understand this, anyway DropInhe
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote: > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote: >> >> I recall I had proposed a fix for the same thing some time ago [1]. > > Wow. About 1.5 years ago and stuck? I have a concrete case where > this harms so I'd like to fix that this time. How can we move on? Agreed that this should be fixed. Your proposed approach #1 to recheck the inheritance after obtaining the lock on the child table seems to be a good way forward. Approach #2 of reordering locking is a simpler solution, but does not completely prevent the problem, because DROP TABLE child can still cause it to occur, as you mentioned. >>> The cause is that NO INHERIT doesn't take an exlusive lock on the >>> parent. This allows expand_inherited_rtentry to add the child >>> relation into appendrel after removal from the inheritance but >>> still exists. >> >> Right. >> >>> I see two ways to fix this. >>> >>> The first patch adds a recheck of inheritance relationship if the >>> corresponding attribute is missing in the child in >>> make_inh_translation_list(). The recheck is a bit complex but it >>> is not performed unless the sequence above is happen. It checks >>> duplication of relid (or cycles in inheritance) following >>> find_all_inheritors (but doing a bit different) but I'm not sure >>> it is really useful. >> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but >> I guess your hash table based solution will do the job as far as >> performance of this check is concerned, although I haven't checked the >> code closely. > > The hash table is not crucial in the patch. Substantially it just > recurs down looking up pg_inherits for the child. I considered > the new index but abandoned because I thought that such case > won't occur so frequently. Agreed. BTW, the hash table in patch #1 does not seem to be really helpful. In the while loop in is_descendant_of_internal(), does hash_search() ever return found = true? AFAICS, it does not. >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on >>> the parent first. >>> >>> Since the latter has a larger impact on the current behavior and >>> we already treat "DROP TABLE child" case in the similar way, I >>> suppose that the first approach would be preferable. >> >> That makes sense. >> >> BTW, in the partitioned table case, the parent is always locked first >> using an AccessExclusiveLock. There are other considerations in that case >> such as needing to recreate the partition descriptor upon termination of >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases). > > Apart from the degree of concurrency, if we keep parent->children > order of locking, such recreation does not seem to be > needed. Maybe I'm missing something. Sorry to have introduced that topic in this thread, but I will try to explain anyway why things are the way they are currently: Once a table is no longer a partition of the parent (detached or dropped), we must make sure that the next commands in the transaction don't see it as one. That information is currently present in the relcache (rd_partdesc), which is used by a few callers, most notably the tuple-routing code. Next commands must recreate the entry so that the correct thing happens based on the updated information. More precisely, we must invalidate the current entry. RelationClearRelation() will either delete the entry or rebuild it. If it's being referenced somewhere, it will be rebuilt. The place holding the reference may also be looking at the content of rd_partdesc, which we don't require them to make a copy of, so we must preserve its content while other fields of RelationData are being read anew from the catalog. We don't have to preserve it if there has been any change (partition added/dropped), but to make such a change one would need to take a strong enough lock on the relation (parent). We assume here that anyone who wants to reference rd_partdesc takes at least AccessShareLock lock on the relation, and anyone who wants to change its content must take a lock that will conflict with it, so AccessExclusiveLock. Note that in all of this, we are only talking about one relation, that is the parent, so parent -> child ordering of taking locks may be irrelevant. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Hello, At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote in <7f5fe522-f328-3c40-565f-5e1ce3745...@lab.ntt.co.jp> > Hello, > > On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote: > > Hello. > > > > I had a case of unexpected error caused by ALTER TABLE NO > > INHERIT. > > > > =# CREATE TABLE p (a int); > > =# CREATE TABLE c1 () INHERITS (p); > > > > session A=# BEGIN; > > session A=# ALTER TABLE c1 NO INHERIT p; > > > > session B=# EXPLAIN ANALYZE SELECT * FROM p; > > (blocked) > > > > session A=# COMMIT; > > > > session B: ERROR: could not find inherited attribute "a" of relation "c1" > > > > This happens at least back to 9.1 to master and doesn't seem to > > be a designed behavior. > > I recall I had proposed a fix for the same thing some time ago [1]. Wow. About 1.5 years ago and stuck? I have a concrete case where this harms so I'd like to fix that this time. How can we move on? > > The cause is that NO INHERIT doesn't take an exlusive lock on the > > parent. This allows expand_inherited_rtentry to add the child > > relation into appendrel after removal from the inheritance but > > still exists. > > Right. > > > I see two ways to fix this. > > > > The first patch adds a recheck of inheritance relationship if the > > corresponding attribute is missing in the child in > > make_inh_translation_list(). The recheck is a bit complex but it > > is not performed unless the sequence above is happen. It checks > > duplication of relid (or cycles in inheritance) following > > find_all_inheritors (but doing a bit different) but I'm not sure > > it is really useful. > > I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but > I guess your hash table based solution will do the job as far as > performance of this check is concerned, although I haven't checked the > code closely. The hash table is not crucial in the patch. Substantially it just recurs down looking up pg_inherits for the child. I considered the new index but abandoned because I thought that such case won't occur so frequently. > > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > > the parent first. > > > > Since the latter has a larger impact on the current behavior and > > we already treat "DROP TABLE child" case in the similar way, I > > suppose that the first approach would be preferable. > > That makes sense. > > BTW, in the partitioned table case, the parent is always locked first > using an AccessExclusiveLock. There are other considerations in that case > such as needing to recreate the partition descriptor upon termination of > inheritance (both the DETACH PARTITION and also DROP TABLE child cases). Apart from the degree of concurrency, if we keep parent->children order of locking, such recreation does not seem to be needed. Maybe I'm missing something. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Hello, On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote: > Hello. > > I had a case of unexpected error caused by ALTER TABLE NO > INHERIT. > > =# CREATE TABLE p (a int); > =# CREATE TABLE c1 () INHERITS (p); > > session A=# BEGIN; > session A=# ALTER TABLE c1 NO INHERIT p; > > session B=# EXPLAIN ANALYZE SELECT * FROM p; > (blocked) > > session A=# COMMIT; > > session B: ERROR: could not find inherited attribute "a" of relation "c1" > > This happens at least back to 9.1 to master and doesn't seem to > be a designed behavior. I recall I had proposed a fix for the same thing some time ago [1]. > The cause is that NO INHERIT doesn't take an exlusive lock on the > parent. This allows expand_inherited_rtentry to add the child > relation into appendrel after removal from the inheritance but > still exists. Right. > I see two ways to fix this. > > The first patch adds a recheck of inheritance relationship if the > corresponding attribute is missing in the child in > make_inh_translation_list(). The recheck is a bit complex but it > is not performed unless the sequence above is happen. It checks > duplication of relid (or cycles in inheritance) following > find_all_inheritors (but doing a bit different) but I'm not sure > it is really useful. I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but I guess your hash table based solution will do the job as far as performance of this check is concerned, although I haven't checked the code closely. > The second patch lets ALTER TABLE NO INHERIT to acquire locks on > the parent first. > > Since the latter has a larger impact on the current behavior and > we already treat "DROP TABLE child" case in the similar way, I > suppose that the first approach would be preferable. That makes sense. BTW, in the partitioned table case, the parent is always locked first using an AccessExclusiveLock. There are other considerations in that case such as needing to recreate the partition descriptor upon termination of inheritance (both the DETACH PARTITION and also DROP TABLE child cases). Thanks, Amit [1] https://www.postgresql.org/message-id/565EB1F2.7000201%40lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Race between SELECT and ALTER TABLE NO INHERIT
Hello. I had a case of unexpected error caused by ALTER TABLE NO INHERIT. =# CREATE TABLE p (a int); =# CREATE TABLE c1 () INHERITS (p); session A=# BEGIN; session A=# ALTER TABLE c1 NO INHERIT p; session B=# EXPLAIN ANALYZE SELECT * FROM p; (blocked) session A=# COMMIT; session B: ERROR: could not find inherited attribute "a" of relation "c1" This happens at least back to 9.1 to master and doesn't seem to be a designed behavior. The cause is that NO INHERIT doesn't take an exlusive lock on the parent. This allows expand_inherited_rtentry to add the child relation into appendrel after removal from the inheritance but still exists. I see two ways to fix this. The first patch adds a recheck of inheritance relationship if the corresponding attribute is missing in the child in make_inh_translation_list(). The recheck is a bit complex but it is not performed unless the sequence above is happen. It checks duplication of relid (or cycles in inheritance) following find_all_inheritors (but doing a bit different) but I'm not sure it is really useful. The second patch lets ALTER TABLE NO INHERIT to acquire locks on the parent first. Since the latter has a larger impact on the current behavior and we already treat "DROP TABLE child" case in the similar way, I suppose that the first approach would be preferable. Any comments or thoughts? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index e5fb52c..1670d11 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -42,6 +42,8 @@ typedef struct SeenRelsEntry ListCell *numparents_cell; /* corresponding list cell */ } SeenRelsEntry; +static bool is_descendent_of_internal(Oid parentId, Oid childId, + HTAB *seen_rels); /* * find_inheritance_children * @@ -376,3 +378,71 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId) return result; } + +/* + * Check if the child is really a descendent of the parent + */ +bool +is_descendent_of(Oid parentId, Oid childId) +{ + HTAB *seen_rels; + HASHCTL ctl; + bool ischild = false; + + memset(&ctl, 0, sizeof(ctl)); + ctl.keysize = sizeof(Oid); + ctl.entrysize = sizeof(Oid); + ctl.hcxt = CurrentMemoryContext; + + seen_rels = hash_create("is_descendent_of temporary table", + 32, /* start small and extend */ + &ctl, + HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); + + ischild = is_descendent_of_internal(parentId, childId, seen_rels); + + hash_destroy(seen_rels); + + return ischild; +} + +static bool +is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels) +{ + Relation inhrel; + SysScanDesc scan; + ScanKeyData key[1]; + bool ischild = false; + HeapTuple inheritsTuple; + + inhrel = heap_open(InheritsRelationId, AccessShareLock); + ScanKeyInit(&key[0], Anum_pg_inherits_inhparent, +BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(parentId)); + scan = systable_beginscan(inhrel, InheritsParentIndexId, true, + NULL, 1, key); + + while ((inheritsTuple = systable_getnext(scan)) != NULL) + { + bool found; + Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid; + + hash_search(seen_rels, &inhrelid, HASH_ENTER, &found); + + /* + * Recursively check into children. Although there can't theoretically + * be any cycles in the inheritance graph, check the cycles following + * find_all_inheritors. + */ + if (inhrelid == childId || + (!found && is_descendent_of_internal(inhrelid, childId, seen_rels))) + { + ischild = true; + break; + } + } + + systable_endscan(scan); + heap_close(inhrel, AccessShareLock); + + return ischild; +} diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index cf46b74..f1c582a 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -99,10 +99,9 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti); -static void make_inh_translation_list(Relation oldrelation, +static List *make_inh_translation_list(Relation oldrelation, Relation newrelation, - Index newvarno, - List **translated_vars); + Index newvarno); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, @@ -1502,14 +1501,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) */ if (childrte->relkind != RELKIND_PARTITIONED_TABLE) { + List *translated_vars = +make_inh_translation_list(oldrelation, newrelation, + childRTindex); + + if (!translated_vars) + { +/* + * This childrel is no longer a child of the parent. + * Close the child relation releas