On Thu, May 14, 2015 at 12:19:44PM -0400, Robert Haas wrote:
> Well, I'm just shooting from the hip here, but it seems to me that the
> basic pipeline as it exists today is Join -> Aggregate -> SetOp ->
> Limit -> LockRows. I don't think Limit or LockRows can be moved any
> earlier.  SetOps have a lot in common with Aggregates, though, and
> might be able to commute.  For instance, if you had an EXCEPT that was
> going to knock out most of the rows and also a DISTINCT, you might
> want to postpone the DISTINCT until after you do the EXCEPT, or you
> might just want to do both at once:

Also thinking a little from the side: an SQL query is a expression of
some tree in relational algebra, and a Path is a representation of a
way to acheive some sub-part of it.  The planner attempts to try find
alternative ways of generating path by reordering joins but AIUI
doesn't do much about aggregations.

What is essentially being discussed here is also allowing commuting
aggregations and joins.  DISTINCT and DISTINCT ON are just special
kinds of aggregations so don't need to be considered especially.

ISTM you should be able to for each aggregation note which joins it
commutes with and which it doesn't, perhaps even with a simple bitmap. 
The backbone of the plan is the order of the aggregations which
generally won't commute at all, and the joins which can float around as
long as the dependancies (stuff that won't commute) are satisfied.

> And both set operations and aggregates can sometimes commute with
> joins, which seems like the stickiest wicket, because there can't be
> more than a couple of levels of grouping or aggregation in the same
> subquery (GROUP BY, DISTINCT, UNION?) but there can be lots of joins,
> and if a particular aggregate can be implemented in lots of places,
> things start to get complicated.

I think this is basically the same idea. I'm not sure aggregates and
set operations can commute in general, unless you could somehow
(conceptually) describe them as a join/antijoin.  UNION might be
special here.

> Like, if you've got a SELECT DISTINCT ON (a.x) ... FROM
> ...lots...-type query, I think you can pretty much slap the DISTINCT
> on there at any point in the join nest, probably ideally at the point
> where the number of rows is the lowest it's going to get, or maybe
> when the sortorder is convenient.  For a GROUP BY query with ungrouped
> dependent columns, you can do the aggregation before or after those
> joins, but you must do it after the joins that are needed to provide
> all the values needed for the aggregated columns.  If you model this
> with RelOptInfos, you're basically going to need a RelOptInfo to say
> "i include these relids and these aggregation or setop operations".
> So in the worst case each agg/setop doubles the number of RelOptInfos,
> although probably in reality it doesn't because you won't have
> infinite flexibility to reorder things.

I think it's more like the number of possibilities doubles for each
join that could commute with the aggregate.  But as you say the number
of actual possibilities doesn't actually grow that fast.  I think it
may be better to have each path track the relids and aggregates it
covers, but then you need to have an efficient way to work out which
rels/aggregates can be considered for each path.  Either sounds it
sounds like quite a planner overhaul.

Hope this helps,
-- 
Martijn van Oosterhout   <klep...@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment: signature.asc
Description: Digital signature

Reply via email to