Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Dmitry Dolgov Wed, 28 Mar 2018 14:24:40 -0700

> On 22 March 2018 at 14:18, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> 
> wrote:
> On Thu, Mar 22, 2018 at 4:32 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>> On 12 March 2018 at 06:00, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> 
>>> wrote:
>>> Thanks for the note. Here are rebased patches.
>>
>> Since I started to look at this patch, I can share few random notes (although
>> it's not a complete review, I'm in the progress now), maybe it can be 
>> helpful.
>>
>> In `partition_range_bounds_merge`
>>
>>     + if (!merged)
>>     +     break;
>>
>> is a bit redundant I think, because every time `merged` set to false it
>> followed by break.
>
> Yes, right now. May be I should turn it into Assert(merged); What do you 
> think?


Thank you for reply. Yes, that sounds good. But actually you also mentioned
another topic that bothers me about this patch. Different parts of the
algorithm implementation (at least for functions that build maps of matching
partitions) are quite dependent on each other in terms of shared state. At
first glance in `partition_range_bounds_merge` we have about a dozen of
variables of different mutability level, that affect the control flow:

    outer_lb_index
    inner_lb_index
    merged
    merged_index
    overlap
    merged_lb
    merged_ub
    finished_outer
    finished_inner
    ub_cmpval
    lb_cmpval
    inner_has_default
    outer_has_default
    jointype

It looks a bit too much for me, and would require commentaries like "if you
changed the logic here, also take a look there". But I'm not saying that I have
any specific suggestions how to change it (although I'll definitely try to do
so, at least to get some better understanding of the underlying algorithm).

>>
>> I've noticed that in some places `IS_PARTITIONED_REL` was replaced
>>
>>     - if (!IS_PARTITIONED_REL(joinrel))
>>     + if (joinrel->part_scheme == NULL)
>>
>> but I'm not quite follow why? Is it because `boundinfo` is not available
>> anymore at this point? If so, maybe it makes sense to update the commentary 
>> for
>> this macro and mention to not use for joinrel.
>
>
> This is done in try_partitionwise_join(). As explained in the comment
>
>      * Get the list of matching partitions to be joined along with the
>      * partition bounds of the join relation. Because of the restrictions
>      * imposed by partition matching algorithm, not every pair of joining
>      * relations for this join will be able to use partition-wise join. But 
> all
>      * those pairs which can use partition-wise join will produce the same
>      * partition bounds for the join relation.
>
> boundinfo for the join relation is built in this function. So, we
> don't have join relation's partitioning information fully set up yet.
> So we can't use IS_PARTITIONED_REL() there. joinrel->part_scheme if
> set tells that the joining relations have matching partition schemes
> and thus the join relation can possibly use partition-wise join
> technique. If it's not set, then we can't use partition-wise join.
>
> But IS_PARTITIONED_REL() is still useful at a number of other places,
> where it's known to encounter a RelOptInfo whose partitioning
> properties are fully setup. So, I don't think we should change the
> macro or the comments above it.

Just to make myself clear, I wanted to suggest not to change the commentary for
`IS_PARTITIONED_REL` significantly, but just add a sentence that you need to
check if given relation is fully set up.

Also, few more random notes (mostly related to readability, since I found some
parts of the patch hard to read, but of course it's arguable).

```
    PartitionRangeBound outer_lb;
    PartitionRangeBound outer_ub;
    PartitionRangeBound inner_lb;
    PartitionRangeBound inner_ub;
    PartitionRangeBound *merged_lb = NULL;
    PartitionRangeBound *merged_ub = NULL;
```

Maybe it would be better to not repeat the type here? Let's say:

```
    PartitionRangeBound outer_lb,
                        outer_ub,
                        ...
```

It's just too long and distracting.

```
partition_range_bounds_merge(int partnatts, FmgrInfo *partsupfuncs,
                             Oid *partcollations, PartitionBoundInfo outer_bi,
                             int outer_nparts, PartitionBoundInfo inner_bi,
                             int inner_nparts, JoinType jointype,
                             List **outer_parts, List **inner_parts)
```

>From what I see in `partition.c` there are a lot functions that accept
`partnatts`, `partcollations` only to pass it down to, e.g.
`partition_rbound_cmp`.
What do you think about introducing a data structure to keep these arguments,
and pass only an instance of this structure instead?

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Reply via email to