Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

```On Thu, Apr 20, 2017 at 10:42 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 6:55 AM, Ashutosh Bapat
> <ashutosh.ba...@enterprisedb.com> wrote:
>> When we merge partition bounds from two relations with different
>> partition key types, the merged partition bounds need to have some
>> information abound the way those constants look like e.g. their
>> length, structure etc. That's the reason we need to store partition
>> key types of merged partitioning scheme. Consider a three way join (i4
>> JOIN i8 ON i4.x = i8.x) JOIN i2 ON (i2.x = i.x). When we compare
>> partition bounds of i4 and i8, we use operators for int4 and int8. The
>> join i4 JOIN i8 will get partition bounds by merging those of i4 and
>> i8. When we come to join with i2, we need to know which operators to
>> use for comparing the partition bounds of the join with those of i2.
>>
>> So, if the partition key types of the joining relations differ (but
>> they have matching partitioning schemes per strategy, natts and
>> operator family) the partition bounds of the join are converted to the
>> wider type among the partition key types of the joining tree.
>> Actually, as I am explained earlier we could choose a wider outer type
>> for an OUTER join and shorter type for inner join. This type is used
>> as partition key type of the join. In the above case join between i4
>> and i8 have its partition bounds converted to i8 (or i4) and then when
>> it is joined with i2 the partition bounds of the join are converted to
>> i8 (or i2).
>
> I don't understand why you think that partition-wise join needs any
> new logic here; if this were a non-partitionwise join, we'd similarly
> need to use the correct operator, but the existing code handles that
> just fine.  If the join is performed partition-wise, it should use the
> same operators that would have been used by a non-partitionwise join
> between the same tables.
>
> I think the choice of operator depends only on the column types, and
> that the "width" of those types has nothing to do with it.  For
> example, if the user writes .WHERE A.x = B.x AND B.x = C.x, the
> operator for an A/B join or a B/C join will be the one that appears in
> the query; parse analysis will have identified which specific operator
> is meant based on the types of the columns.  If the optimizer
> subsequently decides to reorder the joins and perform the A/C join
> first, it will go hunt down the operator with the same strategy number
> in the same operator family that takes the type of A.x on one side and
> the type of C.x on the other side.  No problem.  A partition-wise join
> between A and C will use that same operator; again, no problem.
>
> Your example involves joining the output of a join between i4 and i8
> against i2, so it seems there is some ambiguity about what the input
> type should be.  But, again, the planner already copes with this
> problem.  In fact, the join is performed either using i4.x or i8.x --
> I don't know what happens, or whether it depends on other details of
> the query or the plan -- and the operator which can accept that value
> on one side and i2.x on the other side is the one that gets used.```
```
I think you are confusing join condition application and partition
bounds of a join relation. What you have described above is how
operators are chosen to apply join conditions - it picks up the
correct operator from the operator family based on the column types
being used in join condition. That it can do because the columns being
joined are both present the relations being joined, irrespective of
which pair of relations is being joined. In your example, A.x, B.x and
C.x are all present on one of the sides of join irrespective of
whether the join is executed as (AB)C, A(BC) or (AC)B.

But the problem we are trying to solve here about partition bounds of
the join relation: what should be the partition bounds of AB, BC or
AC? When we compare partition bounds of and intermediate join with
other intermediate join (e.g. AB with those of C) what operator should
be used? You seem to be suggesting that we keep as many sets of
partition bounds as there are base relations participating in the join
and then use appropriate partition bounds based on the columns in the
join conditions, so that we can use the same operator as used in the
join condition. That doesn't seem to be a good option since the
partition bounds will all have same values, only differing in their
binary representation because of differences in data types. I am of
the opinion that we save a single set of partition bounds. We have to
then associate a data type with bounds to know binary representation
of partition bound datums. That datatype would be one of the partition
key types of joining relations. I may be wrong in using term "wider"
since its associated with the length of binary reprentation. But we
need some logic to coalesce the two data types based on the type of
join and key type on the outer side.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
```