On Mon, Sep 11, 2017 at 12:16 PM, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > On 2017/09/09 2:38, Ashutosh Bapat wrote: >> On Fri, Sep 8, 2017 at 11:17 AM, Amit Langote wrote: >>> I updated the patch to include just those changes. I'm not sure about >>> one of the Ashutosh's changes whereby the child PlanRowMark is also passed >>> to expand_partitioned_rtentry() to use as the parent PlanRowMark. I think >>> the child RTE, child RT index and child Relation are fine, because they >>> are necessary for creating AppendRelInfos in a desired way for later >>> planning steps. But PlanRowMarks are not processed within the planner >>> afterwards and do not need to be marked with the immediate parent-child >>> association in the same way that AppendRelInfos need to be. >> >> Passing top parent's row mark works today, since there is no >> parent-child specific translation happening there. But if in future, >> we introduce such a translation, row marks for indirect children in a >> multi-level partitioned hierarchy won't be accurate. So, I think it's >> better to pass row marks of the direct parent. > > IMHO, we should make it the responsibility of the future patch to set a > child PlanRowMark's prti to the direct parent's RT index, when we actually > know that it's needed for something. We clearly know today why we need to > pass the other objects like child RT entry, RT index, and Relation, so we > should limit this patch to pass only those objects to the recursive call. > That makes this patch a relatively easy to understand change.
I think you are mixing two issues here 1. setting parent RTI in child PlanRowMark and 2. passing immediate parent's PlanRowMark to expand_single_inheritance_child(). I have discussed 1 in my reply to Robert. About 2 you haven't given any particular comments to my reply. To me it looks like it's this patch that introduces the notion of multi-level expansion, so it's natural for this patch to pass PlanRowMark in cascaded fashion similar to other structures. > >>> I also included the changes to add_paths_to_append_rel() from my patch on >>> the "path toward faster partition pruning" thread. We'd need that change, >>> because while add_paths_to_append_rel() is called for all partitioned >>> table RTEs in a given partition tree, expand_inherited_rtentry() would >>> have set up a PartitionedChildRelInfo only for the root parent, so >>> get_partitioned_child_rels() would not find the same for non-root >>> partitioned table rels and crash failing the Assert. The changes I made >>> are such that we call get_partitioned_child_rels() only for the parent >>> rels that are known to correspond root partitioned tables (or as you >>> pointed out on the thread, "the table named in the query" as opposed those >>> added to the query as result of inheritance expansion). In addition to >>> the relkind check on the input RTE, it would seem that checking that the >>> reloptkind is RELOPT_BASEREL would be enough. But actually not, because >>> if a partitioned table is accessed in a UNION ALL query, reloptkind even >>> for the root partitioned table (the table named in the query) would be >>> RELOPT_OTHER_MEMBER_REL. The only way to confirm that the input rel is >>> actually the root partitioned table is to check whether its parent rel is >>> not RTE_RELATION, because the parent in case of UNION ALL Append is a >>> RTE_SUBQUERY RT entry. >>> >> >> There was a change in my 0003 patch, which fixed the crash. It checked >> for RELOPT_BASEREL and RELKIND_PARTITIONED_TABLE. I have pulled it in >> my 0001 patch. It no more crashes. I tried various queries involving >> set operations and bare multi-level partitioned table scan with my >> patch, but none of them showed any anomaly. Do you have a testcase >> which shows problem with my patch? May be your suggestion is correct, >> but corresponding code implementation is slightly longer than I would >> expect. So, we should go with it, if there is corresponding testcase >> which shows why it's needed. > > If we go with your patch, partitioned tables won't get locked, for > example, in case of the following query (p is a partitioned table): > > select 1 from p union all select 2 from p; > > That's because the RelOptInfos for the two instances of p in the above > query are RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL. They are children > of the Append corresponding to the UNION ALL subquery RTE. So, > partitioned_rels does not get set per your proposed code. Session 1: postgres=# begin; BEGIN postgres=# select 1 from t1 union all select 2 from t1; ?column? ---------- (0 rows) postgres=# select pg_backend_pid(); pg_backend_pid ---------------- 28843 (1 row) Session 2 postgres=# select locktype, relation::regclass, virtualxid, virtualtransaction, pid, mode, granted, fastpath from pg_locks; locktype | relation | virtualxid | virtualtransaction | pid | mode | granted | fastpath ------------+----------+------------+--------------------+-------+-----------------+---------+---------- relation | pg_locks | | 4/14 | 28854 | AccessShareLock | t | t virtualxid | | 4/14 | 4/14 | 28854 | ExclusiveLock | t | t relation | t1p1p1 | | 3/9 | 28843 | AccessShareLock | t | t relation | t1p1 | | 3/9 | 28843 | AccessShareLock | t | t relation | t1 | | 3/9 | 28843 | AccessShareLock | t | t virtualxid | | 3/9 | 3/9 | 28843 | ExclusiveLock | t | t (6 rows) So, all partitioned partitions are getting locked correctly. Am I missing something? > >> >> In your patch >> >> + parent_rel = root->simple_rel_array[parent_relid]; >> + get_pcinfo = (parent_rel->rtekind == RTE_SUBQUERY); >> >> Do you mean RTE_RELATION as you explained above? > > No, I mean RTE_SUBQUERY. > > If the partitioned table RTE in question corresponds to one named in the > query, we should be able to find its pcinfo in root->pcinfo_list. If the > partitioned table RTE is one added as result of inheritance expansion, it > won't have an associated pcinfo. So, we should find a way to distinguish > them from one another. The first idea that had occurred to me was the > same as yours -- RelOptInfo of the partitioned table RTE named in the > query would be of reloptkind RELOPT_BASEREL and those of the partitioned > table RTE added as result of inheritance expansion will be of reloptkind > RELOPT_OTHER_MEMBER_REL. Although the latter is always true, the former > is not. If the partitioned table named in the query appears under UNION > ALL query, then its reloptkind will be RELOPT_OTHER_MEMBER_REL. That > means we have to use some other means to distinguish partitioned table > RTEs that have an associated pcinfo from those that don't. So, I devised > this method of looking at the parent RTE (if any) for distinguishing the > two. Partitioned table named in the query either doesn't have the parent > or if it does, the parent could only ever be a UNION ALL subquery > (RTE_SUBQUERY). Partitioned tables added as part of inheritance expansion > will always have the parent and the parent will only ever be a table > (RTE_RELATION). > Actually, the original problem that caused this discussion started with an assertion failure in get_partitioned_child_rels() as Assert(list_length(result) >= 1); This assertion fails if result is NIL when an intermediate partitioned table is passed. May be we should assert (result == NIL || list_length(result) == 1) and allow that function to be called even for intermediate partitioned partitions for which the function will return NIL. That will leave the code in add_paths_to_append_rel() simple. Thoughts? > > In create_lateral_join_info(): > > + Assert(IS_SIMPLE_REL(brel)); > + Assert(brte); > > The second Assert is either unnecessary or should be placed first. simple_rte_array[] may have some NULL entries. Second assert makes sure that we aren't dealing with a NULL entry. Any particular reason to reorder the asserts? > > The following comment could be made a bit clearer. > > + * In the case of table inheritance, the parent RTE is directly > linked > + * to every child table via an AppendRelInfo. In the case of table > + * partitioning, the inheritance hierarchy is expanded one level at a > + * time rather than flattened. Therefore, an other member rel > that is > + * a partitioned table may have children of its own, and must > + * therefore be marked with the appropriate lateral info so that > those > + * children eventually get marked also. > > How about: In the case of partitioned table inheritance, the original > parent RTE is linked, via AppendRelInfo, only to its immediate partitions. > Partitions below the first level are accessible only via their immediate > parent's RelOptInfo, which would be of kind RELOPT_OTHER_MEMBER_REL, so > consider those as well. I don't see much difference between those two. We usually do not use macros in comments, so usually comments mention "other member" rel. Let's leave this for the committer to judge. > > In expand_inherited_rtentry(), the following comment fragment is obsolete, > because we *do* now create AppendRelInfo's for partitioned children: > > + /* > + * We keep a list of objects in root, each of which maps a > partitioned > + * parent RT index to the list of RT indexes of its partitioned child > + * tables which do not have AppendRelInfos associated with those. Good catch. I have reworded it as /* * We keep a list of objects in root, each of which maps a root * partitioned parent RT index to the list of RT indexes of descendant * partitioned child tables. Does that look good? > > > By the way, when we call expand_single_inheritance_child() in the > non-partitioned inheritance case, we should pass NULL for childrte_p, > childRTindex_p, childrc_p, instead of declaring variables that won't be > used. Hence, expand_single_inheritance_child() should make those > arguments optional. That introduces an extra "if" condition, which is costlier than an assignment. We have used both the styles in the code. Previously, I have got comments otherwise. So, I am not sure. > > + > + /* > + * If the partitioned table has no partitions or all the partitions are > + * temporary tables from other backends, treat this as non-inheritance > + * case. > + */ > + if (!has_child) > + parentrte->inh = false; > > I guess the above applies to all partitioned tables in the tree, so, I > think we should update the comment in set_rel_size(): > > else if (rte->relkind == RELKIND_PARTITIONED_TABLE) > { > /* > * A partitioned table without leaf partitions is marked > * as a dummy rel. > */ > set_dummy_rel_pathlist(rel); > } > > to say: a partitioned table without partitions is marked as a dummy rel. Done. Thanks again for the catch. I will update the patches once we have some resolution about 1. prti in PlanRowMarks and 2. detection of root partitioned table in add_paths_to_append_rel(). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers