On Mon, Sep 11, 2017 at 12:16 PM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/09/09 2:38, Ashutosh Bapat wrote:
>> On Fri, Sep 8, 2017 at 11:17 AM, Amit Langote wrote:
>>> I updated the patch to include just those changes.  I'm not sure about
>>> one of the Ashutosh's changes whereby the child PlanRowMark is also passed
>>> to expand_partitioned_rtentry() to use as the parent PlanRowMark.  I think
>>> the child RTE, child RT index and child Relation are fine, because they
>>> are necessary for creating AppendRelInfos in a desired way for later
>>> planning steps.  But PlanRowMarks are not processed within the planner
>>> afterwards and do not need to be marked with the immediate parent-child
>>> association in the same way that AppendRelInfos need to be.
>> Passing top parent's row mark works today, since there is no
>> parent-child specific translation happening there. But if in future,
>> we introduce such a translation, row marks for indirect children in a
>> multi-level partitioned hierarchy won't be accurate. So, I think it's
>> better to pass row marks of the direct parent.
> IMHO, we should make it the responsibility of the future patch to set a
> child PlanRowMark's prti to the direct parent's RT index, when we actually
> know that it's needed for something.  We clearly know today why we need to
> pass the other objects like child RT entry, RT index, and Relation, so we
> should limit this patch to pass only those objects to the recursive call.
> That makes this patch a relatively easy to understand change.

I think you are mixing two issues here 1. setting parent RTI in child
PlanRowMark and 2. passing immediate parent's PlanRowMark to

I have discussed 1 in my reply to Robert.

About 2 you haven't given any particular comments to my reply. To me
it looks like it's this patch that introduces the notion of
multi-level expansion, so it's natural for this patch to pass
PlanRowMark in cascaded fashion similar to other structures.

>>> I also included the changes to add_paths_to_append_rel() from my patch on
>>> the "path toward faster partition pruning" thread.  We'd need that change,
>>> because while add_paths_to_append_rel() is called for all partitioned
>>> table RTEs in a given partition tree, expand_inherited_rtentry() would
>>> have set up a PartitionedChildRelInfo only for the root parent, so
>>> get_partitioned_child_rels() would not find the same for non-root
>>> partitioned table rels and crash failing the Assert.  The changes I made
>>> are such that we call get_partitioned_child_rels() only for the parent
>>> rels that are known to correspond root partitioned tables (or as you
>>> pointed out on the thread, "the table named in the query" as opposed those
>>> added to the query as result of inheritance expansion).  In addition to
>>> the relkind check on the input RTE, it would seem that checking that the
>>> reloptkind is RELOPT_BASEREL would be enough.  But actually not, because
>>> if a partitioned table is accessed in a UNION ALL query, reloptkind even
>>> for the root partitioned table (the table named in the query) would be
>>> RELOPT_OTHER_MEMBER_REL.  The only way to confirm that the input rel is
>>> actually the root partitioned table is to check whether its parent rel is
>>> not RTE_RELATION, because the parent in case of UNION ALL Append is a
>>> RTE_SUBQUERY RT entry.
>> There was a change in my 0003 patch, which fixed the crash. It checked
>> my 0001 patch. It no more crashes. I tried various queries involving
>> set operations and bare multi-level partitioned table scan with my
>> patch, but none of them showed any anomaly. Do you have a testcase
>> which shows problem with my patch? May be your suggestion is correct,
>> but corresponding code implementation is slightly longer than I would
>> expect. So, we should go with it, if there is corresponding testcase
>> which shows why it's needed.
> If we go with your patch, partitioned tables won't get locked, for
> example, in case of the following query (p is a partitioned table):
> select 1 from p union all select 2 from p;
> That's because the RelOptInfos for the two instances of p in the above
> query are RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL.  They are children
> of the Append corresponding to the UNION ALL subquery RTE.  So,
> partitioned_rels does not get set per your proposed code.

Session 1:
postgres=# begin;
postgres=# select 1 from t1 union all select 2 from t1;
(0 rows)

postgres=# select pg_backend_pid();
(1 row)

Session 2
postgres=# select locktype, relation::regclass, virtualxid,
virtualtransaction, pid, mode, granted, fastpath from pg_locks;
  locktype  | relation | virtualxid | virtualtransaction |  pid  |
 mode       | granted | fastpath
 relation   | pg_locks |            | 4/14               | 28854 |
AccessShareLock | t       | t
 virtualxid |          | 4/14       | 4/14               | 28854 |
ExclusiveLock   | t       | t
 relation   | t1p1p1   |            | 3/9                | 28843 |
AccessShareLock | t       | t
 relation   | t1p1     |            | 3/9                | 28843 |
AccessShareLock | t       | t
 relation   | t1       |            | 3/9                | 28843 |
AccessShareLock | t       | t
 virtualxid |          | 3/9        | 3/9                | 28843 |
ExclusiveLock   | t       | t
(6 rows)

So, all partitioned partitions are getting locked correctly. Am I
missing something?

>> In your patch
>> +            parent_rel = root->simple_rel_array[parent_relid];
>> +            get_pcinfo = (parent_rel->rtekind == RTE_SUBQUERY);
>> Do you mean RTE_RELATION as you explained above?
> No, I mean RTE_SUBQUERY.
> If the partitioned table RTE in question corresponds to one named in the
> query, we should be able to find its pcinfo in root->pcinfo_list.  If the
> partitioned table RTE is one added as result of inheritance expansion, it
> won't have an associated pcinfo.  So, we should find a way to distinguish
> them from one another.  The first idea that had occurred to me was the
> same as yours -- RelOptInfo of the partitioned table RTE named in the
> query would be of reloptkind RELOPT_BASEREL and those of the partitioned
> table RTE added as result of inheritance expansion will be of reloptkind
> RELOPT_OTHER_MEMBER_REL.  Although the latter is always true, the former
> is not.  If the partitioned table named in the query appears under UNION
> ALL query, then its reloptkind will be RELOPT_OTHER_MEMBER_REL.  That
> means we have to use some other means to distinguish partitioned table
> RTEs that have an associated pcinfo from those that don't.  So, I devised
> this method of looking at the parent RTE (if any) for distinguishing the
> two.  Partitioned table named in the query either doesn't have the parent
> or if it does, the parent could only ever be a UNION ALL subquery
> (RTE_SUBQUERY).  Partitioned tables added as part of inheritance expansion
> will always have the parent and the parent will only ever be a table

Actually, the original problem that caused this discussion started
with an assertion failure in get_partitioned_child_rels() as
Assert(list_length(result) >= 1);

This assertion fails if result is NIL when an intermediate partitioned
table is passed. May be we should assert (result == NIL ||
list_length(result) == 1) and allow that function to be called even
for intermediate partitioned partitions for which the function will
return NIL. That will leave the code in add_paths_to_append_rel()
simple. Thoughts?

> In create_lateral_join_info():
> +        Assert(IS_SIMPLE_REL(brel));
> +        Assert(brte);
> The second Assert is either unnecessary or should be placed first.

simple_rte_array[] may have some NULL entries. Second assert makes
sure that we aren't dealing with a NULL entry. Any particular reason
to reorder the asserts?

> The following comment could be made a bit clearer.
> +         * In the case of table inheritance, the parent RTE is directly
> linked
> +         * to every child table via an AppendRelInfo.  In the case of table
> +         * partitioning, the inheritance hierarchy is expanded one level at a
> +         * time rather than flattened.  Therefore, an other member rel
> that is
> +         * a partitioned table may have children of its own, and must
> +         * therefore be marked with the appropriate lateral info so that
> those
> +         * children eventually get marked also.
> How about: In the case of partitioned table inheritance, the original
> parent RTE is linked, via AppendRelInfo, only to its immediate partitions.
>  Partitions below the first level are accessible only via their immediate
> parent's RelOptInfo, which would be of kind RELOPT_OTHER_MEMBER_REL, so
> consider those as well.

I don't see much difference between those two. We usually do not use
macros in comments, so usually comments mention "other member" rel.
Let's leave this for the committer to judge.

> In expand_inherited_rtentry(), the following comment fragment is obsolete,
> because we *do* now create AppendRelInfo's for partitioned children:
> +        /*
> +         * We keep a list of objects in root, each of which maps a
> partitioned
> +         * parent RT index to the list of RT indexes of its partitioned child
> +         * tables which do not have AppendRelInfos associated with those.

Good catch. I have reworded it as
         * We keep a list of objects in root, each of which maps a root
         * partitioned parent RT index to the list of RT indexes of descendant
         * partitioned child tables.

Does that look good?

> By the way, when we call expand_single_inheritance_child() in the
> non-partitioned inheritance case, we should pass NULL for childrte_p,
> childRTindex_p, childrc_p, instead of declaring variables that won't be
> used.  Hence, expand_single_inheritance_child() should make those
> arguments optional.

That introduces an extra "if" condition, which is costlier than an
assignment. We have used both the styles in the code. Previously, I
have got comments otherwise. So, I am not sure.

> +
> +    /*
> +     * If the partitioned table has no partitions or all the partitions are
> +     * temporary tables from other backends, treat this as non-inheritance
> +     * case.
> +     */
> +    if (!has_child)
> +        parentrte->inh = false;
> I guess the above applies to all partitioned tables in the tree, so, I
> think we should update the comment in set_rel_size():
>                 else if (rte->relkind == RELKIND_PARTITIONED_TABLE)
>                 {
>                     /*
>                      * A partitioned table without leaf partitions is marked
>                      * as a dummy rel.
>                      */
>                     set_dummy_rel_pathlist(rel);
>                 }
> to say: a partitioned table without partitions is marked as a dummy rel.

Done. Thanks again for the catch.

I will update the patches once we have some resolution about 1. prti
in PlanRowMarks and 2. detection of root partitioned table in

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to