Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

Robert Haas Thu, 20 Apr 2017 13:05:43 -0700

On Thu, Apr 20, 2017 at 8:45 AM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> I think you are confusing join condition application and partition
> bounds of a join relation.


You're right, I misunderstood what you were talking about.

> But the problem we are trying to solve here about partition bounds of
> the join relation: what should be the partition bounds of AB, BC or
> AC? When we compare partition bounds of and intermediate join with
> other intermediate join (e.g. AB with those of C) what operator should
> be used? You seem to be suggesting that we keep as many sets of
> partition bounds as there are base relations participating in the join
> and then use appropriate partition bounds based on the columns in the
> join conditions, so that we can use the same operator as used in the
> join condition. That doesn't seem to be a good option since the
> partition bounds will all have same values, only differing in their
> binary representation because of differences in data types.

Well, actually, I think it is a good option, as I wrote in
http://postgr.es/m/CA+TgmoY-LiJ+_S7OijNU_r2y=dhsj539wtqa7cayj-hcecc...@mail.gmail.com

In that email, my principal concern was allowing partition-wise join
to succeed even with slightly different sets of partition boundaries
on the two sides of the join; in particular, if we've got A with A1 ..
A10 and B with B1 .. B10 and the DBA adds A11, I don't want
performance to tank until the DBA gets around to adding B11.  Removing
the partition bounds from the PartitionScheme and storing them
per-RelOptInfo fixes that problem; the fact that it also solves this
problem of what happens when we have different data types on the two
sides looks to me like a second reason to go that way.

And there's a third reason, too, which is that the opfamily mechanism
doesn't currently provide any mechanism for reasoning about which data
types are "wider" or "narrower" in the way that you want.  In general,
there's not even a reason why such a relationship has to exist;
consider two data types t1 and t2 with opclasses t1_ops and t2_ops
that are part of the same opfamily t_ops, and suppose that t1 can
represent any positive integer and t2 can represent any even integer,
or in general that each data type can represent some but not all of
the values that can be represented by the other data type.  In such a
case, neither would be "wider" than the other in the sense that you
need; you essentially want to find a data type within the opfamily to
which all values of any of the types involved in the query can be cast
without error, but there is nothing today which requires such a data
type to exist, and no way to identify which one it is.  In practice,
for all of the built-in opfamilies that have more than one opclass,
such a data type always exists but is not always unique -- in
particular, datetime_ops contains date_ops, timestamptz_ops, and
timestamp_ops, and either of the latter two is a plausible choice for
the "widest" data type of the three.  But there's no way to figure
that out from the opfamily or opclass information we have today.

In theory, it would be possible to modify the opfamily machinery so
that every opfamily designates an optional ordering of types from
"narrowest" to "widest", such that saying t1 is-narrower-than t2 is a
guarantee that every value of type t1 can be cast without error to a
value of type t2.  But I think that's a bad plan.  It means that every
opfamily created by either the core code or some extension now needs
to worry about annotating the opclass with this new information, and
we have to add to core the SQL syntax and supporting code to make that
work.  If it were implementing a valuable feature which could not
practically be implemented without extending the opfamily machinery,
then I guess that's what we'd have to suck it up and incur that
complexity, but in this case it does not appear necessary.  Storing
the partition bounds per-RelOptInfo makes this problem -- and a few
others -- go away.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

Reply via email to