Re: [HACKERS] upper planner path-ification

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 4:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 So, unless we don't find out a solution around planner, 2-phase aggregation is
 like a curry without rice

Simon and I spoke with Tom about this upper planner path-ification
problem at PGCon, and he indicated that he intended to work on it
soon, and that we should bug him about it if he doesn't.

I'm pleased to here this as I seem to have a special talent in that area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-06-23 Thread Kouhei Kaigai
 -Original Message-
 From: David Rowley [mailto:david.row...@2ndquadrant.com]
 Sent: Tuesday, June 23, 2015 2:06 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Robert Haas; pgsql-hackers@postgresql.org; Tom Lane
 Subject: Re: [HACKERS] upper planner path-ification
 
 
 On 23 June 2015 at 13:55, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 
 
   Once we support to add aggregation path during path consideration,
   we need to pay attention morphing of the final target-list according
   to the intermediate path combination, tentatively chosen.
   For example, if partial-aggregation makes sense from cost perspective;
   like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
   billion rows, planner also has to adjust the final target-list according
   to the interim paths. In this case, final output shall be SUM(), instead
   of COUNT().
 
 
 
 
 This sounds very much like what's been discussed here:
 
 http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=VjFhDqFpaWfmj
 +66-4zvcg...@mail.gmail.com
 
 
 The basic concept is that we add another function set to aggregates that allow
 the combination of 2 states. For the case of MIN() and MAX() this will just be
 the same as the transfn. SUM() is similar for many types, more complex for 
 others.
 I've quite likely just borrowed SUM(BIGINT)'s transfer functions to allow
 COUNT()'s to be combined.

STDDEV, VARIANCE and relevant can be constructed using nrows, sum(X) and 
sum(X^2).
REGR_*, COVAR_* and relevant can be constructed using nrows, sum(X), sum(Y),
sum(X^2), sum(Y^2) and sum(X*Y).

Let me introduce a positive side effect of this approach.
Because final aggregate function processes values already aggregated partially,
the difference between the state value and transition value gets relatively 
small.
It reduces accidental errors around floating-point calculation. :-)

 More time does need spent inventing the new combining functions that don't
 currently exist, but that shouldn't matter as it can be done later.
 
 Commitfest link to patch here https://commitfest.postgresql.org/5/131/
 
 I see you've signed up to review it!

Yes, all of us looks at same direction.

Problem is, we have to cross the mountain of the planner enhancement to reach
all the valuable:
 - parallel aggregation
 - aggregation before join
 - remote aggregation via FDW

So, unless we don't find out a solution around planner, 2-phase aggregation is
like a curry without rice

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-06-23 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Tuesday, June 23, 2015 10:18 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: David Rowley; pgsql-hackers@postgresql.org; Tom Lane
 Subject: Re: [HACKERS] upper planner path-ification
 
 On Tue, Jun 23, 2015 at 4:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  So, unless we don't find out a solution around planner, 2-phase aggregation 
  is
  like a curry without rice
 
 Simon and I spoke with Tom about this upper planner path-ification
 problem at PGCon, and he indicated that he intended to work on it
 soon, and that we should bug him about it if he doesn't.
 
 I'm pleased to here this as I seem to have a special talent in that area.

It's fantastic support arm.

So, it may be more productive to pay my efforts for investigation
of David Rowley's patch, based on the assumption if we can add
partial/final-aggregation path node as if scan/join-paths.

My interest also stands on his infrastructure.
  https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf
It is a bit old paper but good starting point.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-06-22 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Thursday, May 14, 2015 10:39 AM
 To: pgsql-hackers@postgresql.org; Tom Lane
 Subject: [HACKERS] upper planner path-ification
 
 Hi,
 
 I've been pulling over Tom's occasional remarks about redoing
 grouping_planner - and maybe further layers of the planner - to work
 with Paths instead of Plans.  I've had difficulty locating all of the
 relevant threads.  Here's everything I've found so far, which I'm
 pretty sure is not everything:
 
 http://www.postgresql.org/message-id/17400.1311716...@sss.pgh.pa.us
 http://www.postgresql.org/message-id/2614.1375730...@sss.pgh.pa.us
 http://www.postgresql.org/message-id/22721.1385048...@sss.pgh.pa.us
 http://www.postgresql.org/message-id/BANLkTinDjjFHNOzESG2J2U4GOkqLu69Zqg@mai
 l.gmail.com
 http://www.postgresql.org/message-id/8479.1418420...@sss.pgh.pa.us
 
 I think there are two separate problems here.  First, there's the
 problem that grouping_planner() is complicated.  It's doing cost
 comparisons, but all in ad-hoc fashion rather than using a consistent
 mechanic the way add_path() does.  Generally, we can plan an aggregate
 using either (1) a hashed aggregate, (2) a sorted aggregate, or (3)
 for min or max, an index scan that just grabs the highest or lowest
 value in lieu of a full table scan.  Instead of generating a plan for
 each of these possibilities, we'd like to generate paths for each one,
 and then pick one to turn into a plan.  AIUI, the hope is that this
 would simplify the cost calculations, and also make it easier to
 inject other paths, such as a path where an FDW performs the
 aggregation step on the remote server.
 
 Second, there's the problem that we might like to order aggregates
 with respect to joins.  If we have something like SELECT DISTINCT ON
 (foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then
 (a) if foo.x has many duplicates, it will be better to DISTINCT-ify
 foo and then join to bar afterwards but (b) if foo.x = bar.x is highly
 selective, it will be better to join to bar first and then
 DISTINCT-ify the result.  Currently, aggregation is always last;
 that's not great.  Hitoshi Harada's proposed strategy of essentially
 figuring out where the aggregation steps can go and then re-planning
 for each one is also not great, because each join problem will be a
 subtree of the one we use for the aggregate-last strategy, and thus
 we're wasting effort by planning the same subtrees multiple times.
 Instead, we might imagine letting grouping planner fish out the best
 paths for the joinrels that represent possible aggregation points,
 generate aggregation paths for each of those, and then work out what
 additional rels need to be joined afterwards.  That sounds hard, but
 not as hard as doing something sensible with what we have today.

Once we support to add aggregation path during path consideration,
we need to pay attention morphing of the final target-list according
to the intermediate path combination, tentatively chosen.
For example, if partial-aggregation makes sense from cost perspective;
like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
billion rows, planner also has to adjust the final target-list according
to the interim paths. In this case, final output shall be SUM(), instead
of COUNT().

I think, this planner enhancement needs a mechanism to inform planner
expected final output for each Path. It means, individual Path node
can have a replacement rule: which target-entry should have what
expression instead, once this path is applied.
In above case, PartialAggPath will have an indication; SUM(nrows)
shall be applied on the location where COUNT(*) is originally
placed.

In case of relations join that involves paths with this replacement
rules, it may be a bit complicated if an expression takes argument
come from both relations; like CORR(inner_rel.X, outer_rel.Y),
I'm not sure whether we can have more broken-down partial aggregate.
However, relations on either of inner/outer side are responsible to
the columns in inner/outer side. Unless all the arguments in aggregate
function come from same side, it should not block aggregate push-
down as long as estimated cost makes sense.

That is a probably makes a problem when we allow to add aggregate
path as a part of partial plan tree.

Thanks,


 I'm inclined to think that it would be useful to solve the first
 problem even if we didn't solve the second one right away (but that
 might be wrong).  As a preparatory step, I'm thinking it would be
 sensible to split grouping_planner() into an outer function that would
 handle the addition of Limit and LockRows nodes and maybe planning of
 set operations, and an inner function that would handle GROUP BY,
 DISTINCT, and possibly window function planning.
 
 Thoughts?

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent 

Re: [HACKERS] upper planner path-ification

2015-06-22 Thread David Rowley
On 23 June 2015 at 13:55, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

 Once we support to add aggregation path during path consideration,
 we need to pay attention morphing of the final target-list according
 to the intermediate path combination, tentatively chosen.
 For example, if partial-aggregation makes sense from cost perspective;
 like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
 billion rows, planner also has to adjust the final target-list according
 to the interim paths. In this case, final output shall be SUM(), instead
 of COUNT().


This sounds very much like what's been discussed here:

http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=vjfhdqfpawfmj+66-4zvcg...@mail.gmail.com


The basic concept is that we add another function set to aggregates that
allow the combination of 2 states. For the case of MIN() and MAX() this
will just be the same as the transfn. SUM() is similar for many types, more
complex for others. I've quite likely just borrowed SUM(BIGINT)'s transfer
functions to allow COUNT()'s to be combined.

More time does need spent inventing the new combining functions that don't
currently exist, but that shouldn't matter as it can be done later.

Commitfest link to patch here https://commitfest.postgresql.org/5/131/

I see you've signed up to review it!

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] upper planner path-ification

2015-05-19 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Hm.  That's a hangover from when query_planner also gave back a
 Tom Plan (singular) rather than a set of Paths.  I don't see any
 Tom fundamental reason why we couldn't generalize it to be a list of
 Tom potentially useful output orderings rather than just one.  But I'm
 Tom a bit concerned about the ensuing growth in planning time; is it
 Tom really all that useful?

The planning time growth is a possible concern, yes. The potential gain
is eliminating one sort step, in the case when the input has a usable
sorted path but grouping_planner happens not to ask for it (when there's
more than just a single rollup, the code currently asks for one of the
sort orders pretty much arbitrarily since it has no real way to know
otherwise). Whether that would justify it... I don't know. Maybe that's
one to save for later to see if there's any feedback from actual use.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-19 Thread Kyotaro HORIGUCHI
At Tue, 19 May 2015 09:04:00 -0400, Robert Haas robertmh...@gmail.com wrote 
in CA+TgmobAV3_DS1sXA+PFWkjvX1K-VNiAnMMJrzPfD43g=-4...@mail.gmail.com
 On Tue, May 19, 2015 at 7:19 AM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
  Tom == Tom Lane t...@sss.pgh.pa.us writes:
   Tom Hm.  That's a hangover from when query_planner also gave back a
   Tom Plan (singular) rather than a set of Paths.  I don't see any
   Tom fundamental reason why we couldn't generalize it to be a list of
   Tom potentially useful output orderings rather than just one.  But I'm
   Tom a bit concerned about the ensuing growth in planning time; is it
   Tom really all that useful?
 
  The planning time growth is a possible concern, yes. The potential gain
  is eliminating one sort step, in the case when the input has a usable
  sorted path but grouping_planner happens not to ask for it (when there's
  more than just a single rollup, the code currently asks for one of the
  sort orders pretty much arbitrarily since it has no real way to know
  otherwise). Whether that would justify it... I don't know. Maybe that's
  one to save for later to see if there's any feedback from actual use.
 
 I kind of doubt that the growth in planning time would be anything too
 unreasonable.  We already consider multiple orderings for ordinary
 base relations, so it's not very obvious why consideration multiple
 orderings for subqueries would be any worse.  If we can arrange to
 throw away useless orderings early, as we do in other cases, then any
 extra paths we consider have a reasonable chance of being useful.

Though I don't think that the simple path-ification of what is
currently done make it grow in any degree, it could rapidly grow
if we unconditionally construct extra upper-paths using the
previously-abandoned extra paths or make them involved in join
considerations. But the growth in planning time could be kept
reasonable if we pay attention so that, as we have done so, the
additional optimization schemes to have simple and precise
bailing-out logic even if they require some complicated
calculation or yields more extra paths.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-19 Thread Robert Haas
On Tue, May 19, 2015 at 7:19 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Hm.  That's a hangover from when query_planner also gave back a
  Tom Plan (singular) rather than a set of Paths.  I don't see any
  Tom fundamental reason why we couldn't generalize it to be a list of
  Tom potentially useful output orderings rather than just one.  But I'm
  Tom a bit concerned about the ensuing growth in planning time; is it
  Tom really all that useful?

 The planning time growth is a possible concern, yes. The potential gain
 is eliminating one sort step, in the case when the input has a usable
 sorted path but grouping_planner happens not to ask for it (when there's
 more than just a single rollup, the code currently asks for one of the
 sort orders pretty much arbitrarily since it has no real way to know
 otherwise). Whether that would justify it... I don't know. Maybe that's
 one to save for later to see if there's any feedback from actual use.

I kind of doubt that the growth in planning time would be anything too
unreasonable.  We already consider multiple orderings for ordinary
base relations, so it's not very obvious why consideration multiple
orderings for subqueries would be any worse.  If we can arrange to
throw away useless orderings early, as we do in other cases, then any
extra paths we consider have a reasonable chance of being useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Incidentally, the most obvious obstacle to better planning of grouping
 sets in the sorted cases is not so much how to pick paths in
 grouping_planner itself, but rather the fact that query_planner wants to
 be given only one sort order. Is there any prospect for improvement
 there?

Hm.  That's a hangover from when query_planner also gave back a Plan
(singular) rather than a set of Paths.  I don't see any fundamental reason
why we couldn't generalize it to be a list of potentially useful output
orderings rather than just one.  But I'm a bit concerned about the ensuing
growth in planning time; is it really all that useful?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Hrm, ok. So for the near future, we should leave it more or less
  as-is?  We don't have a timescale yet, but it's our intention to
  submit a hashagg support patch for grouping sets as soon as time
  permits.

 Tom Well, mumble.  I keep saying that I want to tackle path-ification
 Tom in that area, and I keep not finding the time to actually do it.
 Tom So I'm hesitant to tell you that you should wait on it.  But
 Tom certainly I think that it'll be a lot easier to get hashagg
 Tom costing done in that framework than in what currently exists.

Incidentally, the most obvious obstacle to better planning of grouping
sets in the sorted cases is not so much how to pick paths in
grouping_planner itself, but rather the fact that query_planner wants to
be given only one sort order. Is there any prospect for improvement
there?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Robert Haas
On Sun, May 17, 2015 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Rather than adding tlists per se to Paths, I've been vaguely toying with
 a notion of identifying all the interesting subexpressions in a query
 (expensive functions, aggregates, etc), giving them indexes 1..n, and then
 marking Paths with bitmapsets showing which interesting subexpressions
 they can produce values for.  This would make things like does this Path
 compute all the needed aggregates much cheaper to deal with than a raw
 tlist representation would do.  But maybe that's still not the best way.

I don't know, but it seems like this might be pulling in the opposite
direction from your previously-stated desire to get subquery_planner
to output Paths rather than Plans as soon as possible.  I'm inclined
to think that we need to make a start on this in whatever way is
simplest, and then refine it later.  Adding tlists to (some?) paths
might not be the most beautiful way to get there, but there's a lot of
important stuff piling up behind making some kind of a start on this
refactoring, and we can't just keep postponing all of that stuff
indefinitely.  Or if we do, the project as a whole will be worse off
for it, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, May 17, 2015 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Rather than adding tlists per se to Paths, I've been vaguely toying with
 a notion of identifying all the interesting subexpressions in a query
 (expensive functions, aggregates, etc), giving them indexes 1..n, and then
 marking Paths with bitmapsets showing which interesting subexpressions
 they can produce values for.  This would make things like does this Path
 compute all the needed aggregates much cheaper to deal with than a raw
 tlist representation would do.  But maybe that's still not the best way.

 I don't know, but it seems like this might be pulling in the opposite
 direction from your previously-stated desire to get subquery_planner
 to output Paths rather than Plans as soon as possible.

Sorry, I didn't mean to suggest that that necessarily had to happen right
away.

What we do need right away, though, is *some* design for distinguishing
Paths for the different possible upper-level steps.  I won't cry if we
change it around later, but we have to have something to start with.

So for the moment, let's assume that we still rigidly follow the sequence
of upper-level steps currently embodied in grouping_planner.  (I'm not
sure if it even makes sense to consider other orderings of those
processing steps, but in any case we don't need to allow it on day zero.)
Then, make a dummy RelOptInfo corresponding to the result of each step,
and insert links to those in new fields in PlannerInfo.  (We create these
*before* starting scan/join planning, so that FDWs, custom scans, etc, can
inject paths into these RelOptInfos if they want, so as to represent cases
like remote aggregation.)  Then just use add_path with the appropriate
target RelOptInfo when producing different ways to do grouping etc.

This is a bit ad-hoc but it would be a place to start.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Simon Riggs
On 18 May 2015 at 14:50, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Sun, May 17, 2015 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Rather than adding tlists per se to Paths, I've been vaguely toying with
  a notion of identifying all the interesting subexpressions in a query
  (expensive functions, aggregates, etc), giving them indexes 1..n, and
 then
  marking Paths with bitmapsets showing which interesting subexpressions
  they can produce values for.  This would make things like does this
 Path
  compute all the needed aggregates much cheaper to deal with than a raw
  tlist representation would do.  But maybe that's still not the best way.

  I don't know, but it seems like this might be pulling in the opposite
  direction from your previously-stated desire to get subquery_planner
  to output Paths rather than Plans as soon as possible.

 Sorry, I didn't mean to suggest that that necessarily had to happen right
 away.

 What we do need right away, though, is *some* design for distinguishing
 Paths for the different possible upper-level steps.  I won't cry if we
 change it around later, but we have to have something to start with.

 So for the moment, let's assume that we still rigidly follow the sequence
 of upper-level steps currently embodied in grouping_planner.  (I'm not
 sure if it even makes sense to consider other orderings of those
 processing steps, but in any case we don't need to allow it on day zero.)
 Then, make a dummy RelOptInfo corresponding to the result of each step,
 and insert links to those in new fields in PlannerInfo.  (We create these
 *before* starting scan/join planning, so that FDWs, custom scans, etc, can
 inject paths into these RelOptInfos if they want, so as to represent cases
 like remote aggregation.)  Then just use add_path with the appropriate
 target RelOptInfo when producing different ways to do grouping etc.

 This is a bit ad-hoc but it would be a place to start.

 Comments?


My thinking was to push aggregation down to the lowest level possible in
the plan, hopefully a single relation. That way we can generate paths for
the current grouping_planner options as well as others, such as these

* Push down aggregate prior to a join (which might then affect join
planning)
* Allow parallel queries to follow a
scan-aggregate-collectfromslaves-aggregate strategy (hence need for double
aggregation semantics)
* Allow a lookaside to a Mat View rather than do a scan-aggregate (assume
for now these are maintained correctly)
* Allow a lookaside to an alternate datastore/mechanism via CustomScan
(assume these are maintained correctly)

all of which need to be costed against each other and the current
strategies (aggregate last).

The above proposal sounds like it will do that, but not completely sure.

I'm assuming the O(N^2) Mat View planning problem can be solved in part by
recognizing that many MVs are just single-table plus aggregates, and that
we'd have a small enough number of MVs in play that search would not be a
problem in practice.

I'm also aware that LIMIT is still very badly optimized, so I'm hoping it
helps there also.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 18 May 2015 at 14:50, Tom Lane t...@sss.pgh.pa.us wrote:
 So for the moment, let's assume that we still rigidly follow the sequence
 of upper-level steps currently embodied in grouping_planner.  (I'm not
 sure if it even makes sense to consider other orderings of those
 processing steps, but in any case we don't need to allow it on day zero.)
 Then, make a dummy RelOptInfo corresponding to the result of each step,
 and insert links to those in new fields in PlannerInfo.  (We create these
 *before* starting scan/join planning, so that FDWs, custom scans, etc, can
 inject paths into these RelOptInfos if they want, so as to represent cases
 like remote aggregation.)  Then just use add_path with the appropriate
 target RelOptInfo when producing different ways to do grouping etc.
 
 This is a bit ad-hoc but it would be a place to start.

 My thinking was to push aggregation down to the lowest level possible in
 the plan, hopefully a single relation.

In this sketch, it's basically up to an FDW to recognize when remote
aggregation is possible, and put a suitable ForeignScan path into the
RelOptInfo that corresponds to aggregate output.  It could do that for
either a single foreign relation or a foreign join, if it had enough
smarts.  This is more or less where we are today for foreign queries of
all types: the responsibility is totally on the FDW to figure out whether
it can optimize the query.  I have hopes that we can later provide some
infrastructure that would assist FDWs in that, so they're not all
reinventing the wheel.  But I don't yet have much idea what such
infrastructure ought to look like, and it's probably hopeless to try to
design it until things have stabilized more.

Most of the other stuff you're talking about is entirely off the table
until we think about ways that aggregation could be done below the top
level, which this sketch isn't pretending to address.  I agree with
Robert that that's probably best left till later.  There's a fair amount
of work to do before we even have this much done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-18 Thread Robert Haas
On Mon, May 18, 2015 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't know, but it seems like this might be pulling in the opposite
 direction from your previously-stated desire to get subquery_planner
 to output Paths rather than Plans as soon as possible.

 Sorry, I didn't mean to suggest that that necessarily had to happen right
 away.

 What we do need right away, though, is *some* design for distinguishing
 Paths for the different possible upper-level steps.  I won't cry if we
 change it around later, but we have to have something to start with.

That sounds good to me.

 So for the moment, let's assume that we still rigidly follow the sequence
 of upper-level steps currently embodied in grouping_planner.  (I'm not
 sure if it even makes sense to consider other orderings of those
 processing steps, but in any case we don't need to allow it on day zero.)
 Then, make a dummy RelOptInfo corresponding to the result of each step,
 and insert links to those in new fields in PlannerInfo.  (We create these
 *before* starting scan/join planning, so that FDWs, custom scans, etc, can
 inject paths into these RelOptInfos if they want, so as to represent cases
 like remote aggregation.)  Then just use add_path with the appropriate
 target RelOptInfo when producing different ways to do grouping etc.

 This is a bit ad-hoc but it would be a place to start.

 Comments?

That sounds reasonable, but I think the key thing is to nail down what
the new Path types are going to look like.  If we know that, then
rearranging grouping_planner becomes a matter of adjusting things
piece by piece for the new data structures.  Or at least I think it
does.

I think it would also be an excellent idea to split each of those
upper level steps embodied in grouping_planner into its own
function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-17 Thread Andrew Gierth
 Robert == Robert Haas robertmh...@gmail.com writes:

 Robert I think grouping_planner() is badly in need of some refactoring
 Robert just to make it shorter.  It's over 1000 lines of code, which
 Robert IMHO is a fairly ridiculous length for a single function.

If there's interest, we could do that specific task as part of adding
hashagg support for grouping sets (which would otherwise make it even
longer), or as preparatory work for that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-17 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Robert == Robert Haas robertmh...@gmail.com writes:
  Robert I think grouping_planner() is badly in need of some refactoring
  Robert just to make it shorter.  It's over 1000 lines of code, which
  Robert IMHO is a fairly ridiculous length for a single function.

 If there's interest, we could do that specific task as part of adding
 hashagg support for grouping sets (which would otherwise make it even
 longer), or as preparatory work for that.

I think that refactoring without changing anything about the way it works
will be painful and probably ultimately a dead end.  As an example, the
current_pathkeys local variable is state that's needed throughout that
process, so you'd need some messy notation to pass it around (unless you
stuck it into PlannerRoot, which would be ugly too).  But that would
go away in a path-ified universe, because each Path would be marked as
to its output sort order.  More, a lot of the code that you'd be
relocating is code that we should be trying to get rid of altogether,
not just relocate (to wit all the stuff that does cost-based comparisons
of alternatives).

So I'm all for refactoring, but I think it will happen as a natural
byproduct of path-ification, and otherwise would be rather forced.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-17 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  If there's interest, we could do that specific task as part of
  adding hashagg support for grouping sets (which would otherwise make
  it even longer), or as preparatory work for that.

 Tom I think that refactoring without changing anything about the way
 Tom it works will be painful and probably ultimately a dead end.  As
 Tom an example, the current_pathkeys local variable is state that's
 Tom needed throughout that process, so you'd need some messy notation
 Tom to pass it around (unless you stuck it into PlannerRoot, which
 Tom would be ugly too).  But that would go away in a path-ified
 Tom universe, because each Path would be marked as to its output sort
 Tom order.  More, a lot of the code that you'd be relocating is code
 Tom that we should be trying to get rid of altogether, not just
 Tom relocate (to wit all the stuff that does cost-based comparisons of
 Tom alternatives).

 Tom So I'm all for refactoring, but I think it will happen as a natural
 Tom byproduct of path-ification, and otherwise would be rather forced.

Hrm, ok. So for the near future, we should leave it more or less as-is?
We don't have a timescale yet, but it's our intention to submit a
hashagg support patch for grouping sets as soon as time permits.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-17 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom So I'm all for refactoring, but I think it will happen as a natural
  Tom byproduct of path-ification, and otherwise would be rather forced.

 Hrm, ok. So for the near future, we should leave it more or less as-is?
 We don't have a timescale yet, but it's our intention to submit a
 hashagg support patch for grouping sets as soon as time permits.

Well, mumble.  I keep saying that I want to tackle path-ification in
that area, and I keep not finding the time to actually do it.  So I'm
hesitant to tell you that you should wait on it.  But certainly I think
that it'll be a lot easier to get hashagg costing done in that framework
than in what currently exists.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 So, getting back to this part, what's the value of returning a list of
 Paths rather than a list of Plans?

(1) less work, since we don't have to fill in details not needed for
costing purposes;
(2) paths carry info that the planner wants but the executor doesn't,
notably sort-order annotations.

 target lists are normally computed when paths are converted to plans,
 but for the higher-level plan nodes adding by grouping_planner, the
 path list is typically just passed in.  So would the new path types be
 expected to carry target lists of their own, or would they need to
 figure out the target list on the fly at plan generation time?

Yeah, that is something I've been struggling with while thinking about
this.  I don't really want to add tlists as such to Paths, but it's
not very clear how else to annotate a Path as to what it computes,
and that seems like an annotation we have to have in some form in order
to convert these planning steps into a Path universe.

There are other cases where it would be useful to have some notion of
this kind.  An example is that right now, if you have an expression index
on an expensive function and a query that wants the value of that function,
the planner is able to extract the value from the index --- but there is
nothing that gives any cost benefit to doing so, so it's just as likely
to choose some other index and eat the cost of recalculating the function.
It seems like the only way to fix that in a principled fashion is to have
some concept that the index-scan Path can produce the function value,
and then when we come to some appropriate costing step, penalize the other
paths for having to compute the value that's available for free from this
one.

Rather than adding tlists per se to Paths, I've been vaguely toying with
a notion of identifying all the interesting subexpressions in a query
(expensive functions, aggregates, etc), giving them indexes 1..n, and then
marking Paths with bitmapsets showing which interesting subexpressions
they can produce values for.  This would make things like does this Path
compute all the needed aggregates much cheaper to deal with than a raw
tlist representation would do.  But maybe that's still not the best way.

Another point is that a Path that computes aggregates is fundamentally
different from a Path that doesn't, because it doesn't even produce the
same number of rows.  So I'm not at all sure how to visualize the idea
of a Path that computes only some aggregates, or whether it's even a
sensible thing to worry about supporting.

 One thing that seems like it might complicate things here is that a
 lot of planner functions take PlannerInfo *root as an argument.  But
 if we generate only paths in grouping_planner() and path-ify them
 later, the subquery's root will not be available when we're trying to
 do the Path - Plan transformation.

Ah, you're wrong there, because we hang onto the subquery's root already
(I forget why exactly, but see PlannerGlobal.subroots for SubPlans, and
RelOptInfo.subroot for subquery-in-FROM).  So it would not be a
fundamental problem to postpone create_plan() for a subquery.

 I think grouping_planner() is badly in need of some refactoring just
 to make it shorter.  It's over 1000 lines of code, which IMHO is a
 fairly ridiculous length for a single function.

Amen to that.  But as I said to Andrew, I think this will be a side-effect
of path-ification in this area, and is probably not something to set out
to do first.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-16 Thread Robert Haas
On Wed, May 13, 2015 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For the reasons I mentioned, I'd like to get to a point where
 subquery_planner's output is Paths not Plans as soon as possible.  But the
 idea of coarse representation of steps that we aren't trying to be smart
 about might be useful to save some labor in the short run.

 The zero-order version of that might be a single Path node type that
 represents do whatever grouping_planner would do, which we'd start to
 break down into multiple node types once we had the other APIs fixed.

So, getting back to this part, what's the value of returning a list of
Paths rather than a list of Plans?  It seems that, in general terms,
what we're trying to do is delay performing some of the work, so that
we can generate several candidates relatively inexpensively and decide
only later which one to turn into a Plan.  What's not entirely clear
to me is which parts of the work we're trying to delay.  For example,
target lists are normally computed when paths are converted to plans,
but for the higher-level plan nodes adding by grouping_planner, the
path list is typically just passed in.  So would the new path types be
expected to carry target lists of their own, or would they need to
figure out the target list on the fly at plan generation time?

One thing that seems like it might complicate things here is that a
lot of planner functions take PlannerInfo *root as an argument.  But
if we generate only paths in grouping_planner() and path-ify them
later, the subquery's root will not be available when we're trying to
do the Path - Plan transformation.  Presumably that means anything we
need from the PlannerInfo has to be dug out and saved in the path.

I think grouping_planner() is badly in need of some refactoring just
to make it shorter.  It's over 1000 lines of code, which IMHO is a
fairly ridiculous length for a single function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, May 13, 2015 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For the reasons I mentioned, I'd like to get to a point where
 subquery_planner's output is Paths not Plans as soon as possible.  But the
 idea of coarse representation of steps that we aren't trying to be smart
 about might be useful to save some labor in the short run.
 
 The zero-order version of that might be a single Path node type that
 represents do whatever grouping_planner would do, which we'd start to
 break down into multiple node types once we had the other APIs fixed.

 The problem I'm really interested in solving is gaining the ability to
 add additional aggregation strategies, such as letting an FDW do it
 remotely, or doing it in parallel.  It seems to me that your proposed
 zero-order version of that wouldn't really get us terribly far toward
 that goal - it would be more oriented towards solving the other
 problems you mention, specifically adding more intelligence to setops
 and allowing parameterization of subqueries.  Those things certainly
 have some value, but I think supporting alternate aggregation
 strategies is a lot more interesting.

Clearly we'd like to get to both goals.  I don't see the zero order
design as something we'd ship or even have in the tree for more than
a short time.  But it might still be a useful refactorization step.

In any case, the key question if we're to have Paths representing
higher-level computations is what do we hang our lists of such Paths
off of?.  If we have say both GROUP BY and LIMIT, it's important to
distinguish Paths that purport to do only the grouping step from those
that do both the grouping and the limit.  For the scan/join part of
planning, we do this by attaching the Paths to RelOptInfos that denote
various levels of joining ... but how does that translate to the higher
level processing steps?  Perhaps we could make dummy RelOptInfos that
correspond to having completed different steps of processing; but I've
not got a clear idea about how many such RelOptInfos we'd need, and
in particular not about whether we need to cater for completing those
steps in different orders.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-14 Thread Robert Haas
On Thu, May 14, 2015 at 10:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, the key question if we're to have Paths representing
 higher-level computations is what do we hang our lists of such Paths
 off of?.

Yeah, I was wondering about that, too.

 If we have say both GROUP BY and LIMIT, it's important to
 distinguish Paths that purport to do only the grouping step from those
 that do both the grouping and the limit.  For the scan/join part of
 planning, we do this by attaching the Paths to RelOptInfos that denote
 various levels of joining ... but how does that translate to the higher
 level processing steps?  Perhaps we could make dummy RelOptInfos that
 correspond to having completed different steps of processing; but I've
 not got a clear idea about how many such RelOptInfos we'd need, and
 in particular not about whether we need to cater for completing those
 steps in different orders.

Well, I'm just shooting from the hip here, but it seems to me that the
basic pipeline as it exists today is Join - Aggregate - SetOp -
Limit - LockRows. I don't think Limit or LockRows can be moved any
earlier.  SetOps have a lot in common with Aggregates, though, and
might be able to commute.  For instance, if you had an EXCEPT that was
going to knock out most of the rows and also a DISTINCT, you might
want to postpone the DISTINCT until after you do the EXCEPT, or you
might just want to do both at once:

rhaas=# explain select distinct * from (values (1), (1), (2), (3)) x
except select 3;
 QUERY PLAN
-
 HashSetOp Except  (cost=0.06..0.17 rows=4 width=3)
   -  Append  (cost=0.06..0.16 rows=5 width=3)
 -  Subquery Scan on *SELECT* 1  (cost=0.06..0.14 rows=4 width=4)
   -  HashAggregate  (cost=0.06..0.10 rows=4 width=4)
 Group Key: *VALUES*.column1
 -  Values Scan on *VALUES*  (cost=0.00..0.05
rows=4 width=4)
 -  Subquery Scan on *SELECT* 2  (cost=0.00..0.02 rows=1 width=0)
   -  Result  (cost=0.00..0.01 rows=1 width=0)
(8 rows)

In an ideal world, we'd only hash once.

And both set operations and aggregates can sometimes commute with
joins, which seems like the stickiest wicket, because there can't be
more than a couple of levels of grouping or aggregation in the same
subquery (GROUP BY, DISTINCT, UNION?) but there can be lots of joins,
and if a particular aggregate can be implemented in lots of places,
things start to get complicated.

Like, if you've got a SELECT DISTINCT ON (a.x) ... FROM
...lots...-type query, I think you can pretty much slap the DISTINCT
on there at any point in the join nest, probably ideally at the point
where the number of rows is the lowest it's going to get, or maybe
when the sortorder is convenient.  For a GROUP BY query with ungrouped
dependent columns, you can do the aggregation before or after those
joins, but you must do it after the joins that are needed to provide
all the values needed for the aggregated columns.  If you model this
with RelOptInfos, you're basically going to need a RelOptInfo to say
i include these relids and these aggregation or setop operations.
So in the worst case each agg/setop doubles the number of RelOptInfos,
although probably in reality it doesn't because you won't have
infinite flexibility to reorder things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-14 Thread Martijn van Oosterhout
On Thu, May 14, 2015 at 12:19:44PM -0400, Robert Haas wrote:
 Well, I'm just shooting from the hip here, but it seems to me that the
 basic pipeline as it exists today is Join - Aggregate - SetOp -
 Limit - LockRows. I don't think Limit or LockRows can be moved any
 earlier.  SetOps have a lot in common with Aggregates, though, and
 might be able to commute.  For instance, if you had an EXCEPT that was
 going to knock out most of the rows and also a DISTINCT, you might
 want to postpone the DISTINCT until after you do the EXCEPT, or you
 might just want to do both at once:

Also thinking a little from the side: an SQL query is a expression of
some tree in relational algebra, and a Path is a representation of a
way to acheive some sub-part of it.  The planner attempts to try find
alternative ways of generating path by reordering joins but AIUI
doesn't do much about aggregations.

What is essentially being discussed here is also allowing commuting
aggregations and joins.  DISTINCT and DISTINCT ON are just special
kinds of aggregations so don't need to be considered especially.

ISTM you should be able to for each aggregation note which joins it
commutes with and which it doesn't, perhaps even with a simple bitmap. 
The backbone of the plan is the order of the aggregations which
generally won't commute at all, and the joins which can float around as
long as the dependancies (stuff that won't commute) are satisfied.

 And both set operations and aggregates can sometimes commute with
 joins, which seems like the stickiest wicket, because there can't be
 more than a couple of levels of grouping or aggregation in the same
 subquery (GROUP BY, DISTINCT, UNION?) but there can be lots of joins,
 and if a particular aggregate can be implemented in lots of places,
 things start to get complicated.

I think this is basically the same idea. I'm not sure aggregates and
set operations can commute in general, unless you could somehow
(conceptually) describe them as a join/antijoin.  UNION might be
special here.

 Like, if you've got a SELECT DISTINCT ON (a.x) ... FROM
 ...lots...-type query, I think you can pretty much slap the DISTINCT
 on there at any point in the join nest, probably ideally at the point
 where the number of rows is the lowest it's going to get, or maybe
 when the sortorder is convenient.  For a GROUP BY query with ungrouped
 dependent columns, you can do the aggregation before or after those
 joins, but you must do it after the joins that are needed to provide
 all the values needed for the aggregated columns.  If you model this
 with RelOptInfos, you're basically going to need a RelOptInfo to say
 i include these relids and these aggregation or setop operations.
 So in the worst case each agg/setop doubles the number of RelOptInfos,
 although probably in reality it doesn't because you won't have
 infinite flexibility to reorder things.

I think it's more like the number of possibilities doubles for each
join that could commute with the aggregate.  But as you say the number
of actual possibilities doesn't actually grow that fast.  I think it
may be better to have each path track the relids and aggregates it
covers, but then you need to have an efficient way to work out which
rels/aggregates can be considered for each path.  Either sounds it
sounds like quite a planner overhaul.

Hope this helps,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] upper planner path-ification

2015-05-14 Thread Ashutosh Bapat
On Thu, May 14, 2015 at 7:09 AM, Robert Haas robertmh...@gmail.com wrote:

 Hi,

 I've been pulling over Tom's occasional remarks about redoing
 grouping_planner - and maybe further layers of the planner - to work
 with Paths instead of Plans.  I've had difficulty locating all of the
 relevant threads.  Here's everything I've found so far, which I'm
 pretty sure is not everything:

 http://www.postgresql.org/message-id/17400.1311716...@sss.pgh.pa.us
 http://www.postgresql.org/message-id/2614.1375730...@sss.pgh.pa.us
 http://www.postgresql.org/message-id/22721.1385048...@sss.pgh.pa.us

 http://www.postgresql.org/message-id/banlktindjjfhnozesg2j2u4gokqlu69...@mail.gmail.com
 http://www.postgresql.org/message-id/8479.1418420...@sss.pgh.pa.us

 I think there are two separate problems here.  First, there's the
 problem that grouping_planner() is complicated.  It's doing cost
 comparisons, but all in ad-hoc fashion rather than using a consistent
 mechanic the way add_path() does.  Generally, we can plan an aggregate
 using either (1) a hashed aggregate, (2) a sorted aggregate, or (3)
 for min or max, an index scan that just grabs the highest or lowest
 value in lieu of a full table scan.  Instead of generating a plan for
 each of these possibilities, we'd like to generate paths for each one,
 and then pick one to turn into a plan.  AIUI, the hope is that this
 would simplify the cost calculations, and also make it easier to
 inject other paths, such as a path where an FDW performs the
 aggregation step on the remote server.

 Second, there's the problem that we might like to order aggregates
 with respect to joins.  If we have something like SELECT DISTINCT ON
 (foo.x) foo.x, foo.y, bar.y FROM foo, bar WHERE foo.x = bar.x, then
 (a) if foo.x has many duplicates, it will be better to DISTINCT-ify
 foo and then join to bar afterwards but (b) if foo.x = bar.x is highly
 selective, it will be better to join to bar first and then
 DISTINCT-ify the result.  Currently, aggregation is always last;
 that's not great.  Hitoshi Harada's proposed strategy of essentially
 figuring out where the aggregation steps can go and then re-planning
 for each one is also not great, because each join problem will be a
 subtree of the one we use for the aggregate-last strategy, and thus
 we're wasting effort by planning the same subtrees multiple times.
 Instead, we might imagine letting grouping planner fish out the best
 paths for the joinrels that represent possible aggregation points,
 generate aggregation paths for each of those, and then work out what
 additional rels need to be joined afterwards.  That sounds hard, but
 not as hard as doing something sensible with what we have today.


Another futuristic avenue for optimization would be commuting Append and
Join when joining two similarly partitioned tables.


 I'm inclined to think that it would be useful to solve the first
 problem even if we didn't solve the second one right away (but that
 might be wrong).  As a preparatory step, I'm thinking it would be
 sensible to split grouping_planner() into an outer function that would
 handle the addition of Limit and LockRows nodes and maybe planning of
 set operations, and an inner function that would handle GROUP BY,
 DISTINCT, and possibly window function planning.

 Thoughts?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] upper planner path-ification

2015-05-14 Thread Ashutosh Bapat
On Thu, May 14, 2015 at 8:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Wed, May 13, 2015 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  For the reasons I mentioned, I'd like to get to a point where
  subquery_planner's output is Paths not Plans as soon as possible.  But
 the
  idea of coarse representation of steps that we aren't trying to be smart
  about might be useful to save some labor in the short run.
 
  The zero-order version of that might be a single Path node type that
  represents do whatever grouping_planner would do, which we'd start to
  break down into multiple node types once we had the other APIs fixed.

  The problem I'm really interested in solving is gaining the ability to
  add additional aggregation strategies, such as letting an FDW do it
  remotely, or doing it in parallel.  It seems to me that your proposed
  zero-order version of that wouldn't really get us terribly far toward
  that goal - it would be more oriented towards solving the other
  problems you mention, specifically adding more intelligence to setops
  and allowing parameterization of subqueries.  Those things certainly
  have some value, but I think supporting alternate aggregation
  strategies is a lot more interesting.

 Clearly we'd like to get to both goals.  I don't see the zero order
 design as something we'd ship or even have in the tree for more than
 a short time.  But it might still be a useful refactorization step.

 In any case, the key question if we're to have Paths representing
 higher-level computations is what do we hang our lists of such Paths
 off of?.  If we have say both GROUP BY and LIMIT, it's important to
 distinguish Paths that purport to do only the grouping step from those
 that do both the grouping and the limit.  For the scan/join part of
 planning, we do this by attaching the Paths to RelOptInfos that denote
 various levels of joining ... but how does that translate to the higher
 level processing steps?  Perhaps we could make dummy RelOptInfos that
 correspond to having completed different steps of processing; but I've
 not got a clear idea about how many such RelOptInfos we'd need, and
 in particular not about whether we need to cater for completing those
 steps in different orders.


Given that the results of such computations are relations, having a
RelOptInfo for each operation looks natural. Although, Sort/Order, which
doesn't alter the result-set but just re-orders it, may be an exception
here. It can be modeled as a property of some RelOptInfo rather than
modelling as a RelOptInfo by itself. Same might be the case of row locking
operation.

As Robert has mentioned earlier, reordering does open opportunities for
optimizations and hence can not be ignored. If we are restructuring the
planner, we should restructure to enable such reordering. One possible
solution would be to imitate make_one_rel(). make_one_rel() creates
possible relations (and paths for each of them) which represent joining
order for a given joinlist. Similarly, given a list of operations requested
by user, a function would generate relations (and paths for each of them)
which represent possible orders of those operations (operations also
include JOINs). But that might explode the number of relations (and hence
paths) we examine during planning.


 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] upper planner path-ification

2015-05-14 Thread Kyotaro HORIGUCHI
Hello, this topic lured me on..

At Wed, 13 May 2015 23:43:57 -0400, Robert Haas robertmh...@gmail.com wrote 
in ca+tgmoaqa6bcasgcl8toxwmmoom-d7ebesadz4y58cb+tjq...@mail.gmail.com
 On Wed, May 13, 2015 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Both of those are problems all right, but there is more context here.
 
 Thanks for providing the context.
 
  I'm inclined to think that it would be useful to solve the first
  problem even if we didn't solve the second one right away (but that
  might be wrong).  As a preparatory step, I'm thinking it would be
  sensible to split grouping_planner() into an outer function that would
  handle the addition of Limit and LockRows nodes and maybe planning of
  set operations, and an inner function that would handle GROUP BY,
  DISTINCT, and possibly window function planning.
 
  For the reasons I mentioned, I'd like to get to a point where
  subquery_planner's output is Paths not Plans as soon as possible.  But the
  idea of coarse representation of steps that we aren't trying to be smart
  about might be useful to save some labor in the short run.
 
  The zero-order version of that might be a single Path node type that
  represents do whatever grouping_planner would do, which we'd start to
  break down into multiple node types once we had the other APIs fixed.
 
 The problem I'm really interested in solving is gaining the ability to
 add additional aggregation strategies, such as letting an FDW do it
 remotely, or doing it in parallel.  It seems to me that your proposed
 zero-order version of that wouldn't really get us terribly far toward
 that goal - it would be more oriented towards solving the other
 problems you mention, specifically adding more intelligence to setops
 and allowing parameterization of subqueries.  Those things certainly
 have some value, but I think supporting alternate aggregation
 strategies is a lot more interesting.

I don't know which you are planning the interface between the
split layers to be path or plan, I guess simply splitting
grouping_planner results in the latter.  Although Tom's path
representation up to higher layer does not seem to clash with,
or have anything to do directly with your layer splitting,
complicating the existing complication should make it more
difficult to refactor the planner to have more intelligence.

For example, Limit could be pushed down into bottom of a path
tree ideally and would be also effective for remote node greately
like remote aggregations would be. The zero-order version would
eliminate subqery scan (or frozen plan) path or such nodes (if
any) which obstruct possible global optimization. But simple
splitting of the current grouping_planner looks make it more
difficult. It is so frustrating not to be able to express my
thought well but IMHO, shortly, I wish the planner not to grow to
be in such form.

Certainly it seems not so easy to get over the zero-order
version, but..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-13 Thread Robert Haas
On Wed, May 13, 2015 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Both of those are problems all right, but there is more context here.

Thanks for providing the context.

 I'm inclined to think that it would be useful to solve the first
 problem even if we didn't solve the second one right away (but that
 might be wrong).  As a preparatory step, I'm thinking it would be
 sensible to split grouping_planner() into an outer function that would
 handle the addition of Limit and LockRows nodes and maybe planning of
 set operations, and an inner function that would handle GROUP BY,
 DISTINCT, and possibly window function planning.

 For the reasons I mentioned, I'd like to get to a point where
 subquery_planner's output is Paths not Plans as soon as possible.  But the
 idea of coarse representation of steps that we aren't trying to be smart
 about might be useful to save some labor in the short run.

 The zero-order version of that might be a single Path node type that
 represents do whatever grouping_planner would do, which we'd start to
 break down into multiple node types once we had the other APIs fixed.

The problem I'm really interested in solving is gaining the ability to
add additional aggregation strategies, such as letting an FDW do it
remotely, or doing it in parallel.  It seems to me that your proposed
zero-order version of that wouldn't really get us terribly far toward
that goal - it would be more oriented towards solving the other
problems you mention, specifically adding more intelligence to setops
and allowing parameterization of subqueries.  Those things certainly
have some value, but I think supporting alternate aggregation
strategies is a lot more interesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-05-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I've been pulling over Tom's occasional remarks about redoing
 grouping_planner - and maybe further layers of the planner - to work
 with Paths instead of Plans. ...
 I think there are two separate problems here.  First, there's the
 problem that grouping_planner() is complicated.
 Second, there's the problem that we might like to order aggregates
 with respect to joins.

Both of those are problems all right, but there is more context here.

* As some of the messages you cited mention, we would like to have Path
representations for things like aggregation, because that's the only
way we'll get to a sane API that lets FDWs propose remote aggregation.

* We have also had requests for the planner to be smarter about
UNION/INTERSECT/EXCEPT queries.  Again, that requires cost comparisons,
which would be better done if we had Path representations for the various
ways we'd want to consider.  Also, a big part of the issue there is
wanting to be able to consider sorted versus unsorted plans for the leaf
queries of the set-op (IOW, optionally pushing the sort requirements of
the set-op down into the leaves).  Right now, such comparisons are
impossible because prepunion.c uses subquery_planner to handle the leaf
queries, and what it gets back from that is one finished plan, not
alternative Paths.

* Likewise, subqueries-in-FROM are handled by recursing to
subquery_planner, which gives us back just one frozen Plan for the
subquery.  Among other things this seems to make it too expensive to
consider generating parameterized paths for the subquery.  I'd like
to keep subquery plans in Path form until much later as well.

So these considerations motivate wishing that the result of
subquery_planner could be a list of alternative Paths rather than a Plan,
which means that every operation it knows how to tack onto the scan/join
plan has to be representable by a Path of some sort.

I don't know how granular that needs to be, though.  For instance, one
could certainly imagine that it might be sufficient initially to have a
single WindowPath that represents do all the window functions, and
then at create_plan time we'd generate multiple WindowAgg plan nodes in
the same ad-hoc way as now.  Breaking that down in the Path representation
would only become interesting if it would affect higher-level decisions,
and I'm not immediately seeing how it might do that.

 I'm inclined to think that it would be useful to solve the first
 problem even if we didn't solve the second one right away (but that
 might be wrong).  As a preparatory step, I'm thinking it would be
 sensible to split grouping_planner() into an outer function that would
 handle the addition of Limit and LockRows nodes and maybe planning of
 set operations, and an inner function that would handle GROUP BY,
 DISTINCT, and possibly window function planning.

For the reasons I mentioned, I'd like to get to a point where
subquery_planner's output is Paths not Plans as soon as possible.  But the
idea of coarse representation of steps that we aren't trying to be smart
about might be useful to save some labor in the short run.

The zero-order version of that might be a single Path node type that
represents do whatever grouping_planner would do, which we'd start to
break down into multiple node types once we had the other APIs fixed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers