On Thu, Aug 10, 2017 at 5:43 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
>> Do you think we solving this problem is a prerequisite for
>> partition-wise join? Or should we propose that patch as a separate
>> enhancement?
> No, I'm not proposing anything yet.  For now I just wanted to share
> this observation about where hot CPU time goes in simple tests, and
> since it turned out to be a loop in a loop that I could see an easy to
> way to fix for singleton sets and sets with a small range, I couldn't
> help trying it...  But I'm still trying to understand the bigger
> picture.  I'll be interested to compare profiles with the ordered
> append_rel_list version you have mentioned, to see how that moves the
> hot spots.

Perhaps this is stating the obvious, but it's often better to optimize
things like this at a higher level, rather than by tinkering with
stuff like Bitmapset.  On the other hand, sometimes
micro-optimizations are the way to go, because optimizing
find_ec_member_for_tle(), for example, might involve a much broader
rethink of the planner code than we want to undertake right now.

> I guess one very practical question to ask is: can we plan queries
> with realistic numbers of partitioned tables and partitions in
> reasonable times?  Well, it certainly looks very good for hundreds of
> partitions so far...  My own experience of partitioning with other
> RDBMSs has been on that order, 'monthly partitions covering the past
> 10 years' and similar, but on the other hand it wouldn't be surprising
> to learn that people want to go to many thousands, especially for
> schemas which just keep adding partitions over time and don't want to
> drop them.

I've been thinking that it would be good if this feature - and other
new partitioning features - could scale to about 1000 partitions
without too much trouble.  Eventually, it might be nice to scale
higher, but there's not much point in making partition-wise join scale
to 100,000 partitions if we've got some other part of the system that
runs into trouble beyond 250.

> Curious: would you consider joins between partitioned tables and
> non-partitioned tables where the join is pushed down to be a kind of
> "partition-wise join", or something else?  If so, would that be a
> special case, or just the logical extreme case for
> 0014-WIP-Partition-wise-join-for-1-1-1-0-0-1-partition-ma.patch, where
> one single "partition" on the non-partitioned side maps to all the
> partitions on the partitioned size?

I think this is actually a really important case which we've just
excluded from the initial scope because the problem is hard enough
already.  But it's quite possible that if you are joining partitioned
tables A and B with unpartitioned table X, the right join order could
be A-X-B; the A-X join might knock out a lot of rows.  It's not great
to have to pick between doing the A-B join partitionwise and doing the
A-X join first; you want to get both things.  But we can't do
everything at once.

Further down the road, there's more than one way of doing the A-X
join.  You could join each partition of A to all of X, which is likely
optimal if for example you are doing a nested loop with an inner index
scan on X.  But you could also partition X on the fly using A's
partitioning scheme and then join partitions of A against the
on-the-fly-partitioned version of X.  That's likely to be a lot better
for a merge join with an underlying sort on X.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to