Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-21 Thread Andrew Gierth
 Svenne == Svenne Krap sve...@krap.dk writes:

 Svenne I have the explains,

Can you post the explain analyze outputs?

If need be, you can anonymize the table and column names and any
identifiers by using the anonymization option of explain.depesz.com, but
please only do that if you actually need to.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-20 Thread Svenne Krap
Oh, and I build it on top of f92fc4c95ddcc25978354a8248d3df22269201bc

On 20-04-2015 10:36, Svenne Krap wrote:
 The following review has been posted through the commitfest application:
 make installcheck-world:  tested, failed
 Implements feature:   tested, passed
 Spec compliant:   not tested
 Documentation:tested, passed

 Hi, 

 I have (finally) found time to review this. 

 The syntax is as per spec as I can see, and the queries I have tested have 
 all produced the correct output. 

 The documentation looks good and is clear.

 I think it is spec compliant, but I am not used enough to the spec to be 
 sure. Also I have not understood the function of set quantifier 
 (DISTINCT,ALL) part in the group by clause (and hence not tested it). Hence I 
 haven't marked the spec compliant part.

 The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a 
 sorting problem when looking at the diff) which should be unrelated to GSP. I 
 don't know enough of the check to know if it has already run the GSP tests..

 I have also been running a few tests on some real data. This is run on my 
 laptop with 32 GB of memory and a fast SSD. 

 The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a 
 tiny multi-column lookup table. I am returning a count(*).
 Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and 
 ROLLUP both works fine and if work_buffers are large enough it slightly beats 
 the handwritten union all equivalent (runtimes as 7,6 seconds  to 7,7 
 seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) 
 beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) 
 seconds). 

 The other query is on the same datatable as before, but with three columns 
 (two calculated and one natural) for a cube. I am returning a count(*). 
 First column is extract year from date column
 Second column is divide a value by something and truncate (i.e. make 
 buckets)
 Third column is a litteral integer column. 
 Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). 
 Nothing obvious about why in the explain (analyze,buffers,costs,timing) .

 I have the explains, but as the dataset is semi-private and I don't have any 
 easy way to edit out names in it, I will send it on request (non-disclosure 
 from the recipient is of course a must) and not post it on the list.

 I think the feature is ready to be commited, but am unsure whether I am 
 qualified to gauge that :)

 /Svenne

 The new status of this patch is: Ready for Committer





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-20 Thread Svenne Krap
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Hi, 

I have (finally) found time to review this. 

The syntax is as per spec as I can see, and the queries I have tested have all 
produced the correct output. 

The documentation looks good and is clear.

I think it is spec compliant, but I am not used enough to the spec to be sure. 
Also I have not understood the function of set quantifier (DISTINCT,ALL) part 
in the group by clause (and hence not tested it). Hence I haven't marked the 
spec compliant part.

The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a 
sorting problem when looking at the diff) which should be unrelated to GSP. I 
don't know enough of the check to know if it has already run the GSP tests..

I have also been running a few tests on some real data. This is run on my 
laptop with 32 GB of memory and a fast SSD. 

The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a 
tiny multi-column lookup table. I am returning a count(*).
Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and 
ROLLUP both works fine and if work_buffers are large enough it slightly beats 
the handwritten union all equivalent (runtimes as 7,6 seconds  to 7,7 
seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) 
beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) 
seconds). 

The other query is on the same datatable as before, but with three columns 
(two calculated and one natural) for a cube. I am returning a count(*). 
First column is extract year from date column
Second column is divide a value by something and truncate (i.e. make buckets)
Third column is a litteral integer column. 
Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). 
Nothing obvious about why in the explain (analyze,buffers,costs,timing) .

I have the explains, but as the dataset is semi-private and I don't have any 
easy way to edit out names in it, I will send it on request (non-disclosure 
from the recipient is of course a must) and not post it on the list.

I think the feature is ready to be commited, but am unsure whether I am 
qualified to gauge that :)

/Svenne

The new status of this patch is: Ready for Committer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-12 Thread Michael Paquier
On Sun, Apr 12, 2015 at 2:26 PM, 彭瑞华 ruihua.p...@163.com wrote:
 I am using postgesql 9.4.0. Thanks for your great work on grouping sets
 patch effort.
 I am now compiling postgresql from source code 9.4.0 on Linux platform with
 [gsp-all.patch]  successed and grouping function well, but failed on window
 platform(windows 2003 or window 7).
  It shows unrecognized keywords: grouping, cube,  etc.
 Good suggestions or tips? thanks a lot!

When reviewing or testing a patch targeted for integration in 9.5, you
should avoid using a previous major version but you should apply the
patch on a version based on one of the latest commits of master branch
in the git repository of Postgres. I am guessing that you cannot get
the patch compiling because of some code conflicts.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-12 Thread 彭瑞华
ok, I will try it using git master branch source code. thanks! Of course, using 
gsp-all-latest.patch this time.




At 2015-04-12 14:23:46, Michael Paquier michael.paqu...@gmail.com wrote:
On Sun, Apr 12, 2015 at 2:26 PM, 彭瑞华 ruihua.p...@163.com wrote:
 I am using postgesql 9.4.0. Thanks for your great work on grouping sets
 patch effort.
 I am now compiling postgresql from source code 9.4.0 on Linux platform with
 [gsp-all.patch]  successed and grouping function well, but failed on window
 platform(windows 2003 or window 7).
  It shows unrecognized keywords: grouping, cube,  etc.
 Good suggestions or tips? thanks a lot!

When reviewing or testing a patch targeted for integration in 9.5, you
should avoid using a previous major version but you should apply the
patch on a version based on one of the latest commits of master branch
in the git repository of Postgres. I am guessing that you cannot get
the patch compiling because of some code conflicts.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-04-11 Thread 彭瑞华
Dear:
I am using postgesql 9.4.0. Thanks for your great work on grouping sets patch 
effort.
I am now compiling postgresql from source code 9.4.0 on Linux platform with 
[gsp-all.patch]  successed and grouping function well, but failed on window 
platform(windows 2003 or window 7).
 It shows unrecognized keywords: grouping, cube,  etc.
Good suggestions or tips? thanks a lot!


henry















Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-03-27 Thread Svenne Krap

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 18-03-2015 17:18, Svenne Krap wrote:

 I still need to check against the standard and I will run it against a
non-trivival production load... hopefully I will finish up my review
shortly after the weekend...

I am still on it, but a little delayed. I hope to get it done this weekend.

Svenne




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-03-26 Thread Andrew Gierth
 Svenne == Svenne Krap sve...@krap.dk writes:

 Svenne I still need to check against the standard and I will run it
 Svenne against a non-trivival production load... hopefully I will
 Svenne finish up my review shortly after the weekend...

Thanks for the review so far; any progress? I'm quite interested in
collecting samples of realistic grouping sets queries and their
performance, for use in possible further optimization work. (I don't
need full data or anything like that, just this query ran in x seconds
on N million rows, which is fast enough/not fast enough/too slow to be
any use)

Let me know if there's anything you need...

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-03-18 Thread Svenne Krap
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

This is a midway review, a later will complete it. 

The patch applies against 8d1f239003d0245dda636dfa6cf0add13bee69d6 and builds 
correctly. Make installcheck-world fails, but it seems to be somewhere totally 
unrelated (TCL pl)... 

The documentation is very well-written and the patch implements the documented 
syntax.

I still need to check against the standard and I will run it against a 
non-trivival production load... hopefully I will finish up my review shortly 
after the weekend...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-03-13 Thread Svenne Krap
Patch from message (87d24iukc5@news-spur.riddles.org.uk) fails (tried to 
apply on top of ebc0f5e01d2f ), as b55722692 has broken up the line (in 
src/backend/optimizer/util/pathnode.c):

pathnode-path.rows = estimate_num_groups(root, uniq_exprs, rel-rows);

After patching the added parameter (NULL) in by hand, the build fails as 
src/backend/optimizer/path/indxpath.c:1953 misses the new argument as well - 
this change is not in the patch. 

/Svenne

The new status of this patch is: Waiting on Author


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Robert Haas
On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:
  Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation
  Heikki detail chaining the nodes might be reasonable. But the above
  Heikki gets unreadable if you have more than a few grouping sets.

 It's good for highlighting performance issues in EXPLAIN, too.

Perhaps so, but that doesn't take away from Heikki's point: it's still
ugly.  I don't understand why the sorts can't all be nested under the
GroupAggregate nodes.  We have a number of nodes already (e.g. Append)
that support an arbitrary number of children, and I don't see why we
can't do the same thing here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Robert Haas
On Tue, Sep 9, 2014 at 11:19 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2014-09-09 16:01 GMT+02:00 Robert Haas robertmh...@gmail.com:
 On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
  Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:
   Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation
   Heikki detail chaining the nodes might be reasonable. But the above
   Heikki gets unreadable if you have more than a few grouping sets.
 
  It's good for highlighting performance issues in EXPLAIN, too.

 Perhaps so, but that doesn't take away from Heikki's point: it's still
 ugly.  I don't understand why the sorts can't all be nested under the
 GroupAggregate nodes.  We have a number of nodes already (e.g. Append)
 that support an arbitrary number of children, and I don't see why we
 can't do the same thing here.

 I don't think so showing sort and aggregation is bad idea. Both can have a
 different performance impacts

Sure, showing the sort and aggregation steps is fine.  But I don't see
what advantage we get out of showing them like this:

Aggregate
- Sort
  - ChainAggregate
- Sort
  - ChainAggregate
 - Sort

When we could show them like this:

Aggregate
- Sort
- Sort
- Sort

From both a display perspective and an implementation-complexity
perspective, it seems appealing to have the Aggregate node feed the
data to one sort after another, rather having it send the data down a
very deep pipe.

I might be missing something, of course.  I don't want to presume that
I'm smarter than Andrew, because Andrew is pretty smart.  :-)  But it
seems odd to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Pavel Stehule
2014-09-09 16:01 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
  Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:
   Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation
   Heikki detail chaining the nodes might be reasonable. But the above
   Heikki gets unreadable if you have more than a few grouping sets.
 
  It's good for highlighting performance issues in EXPLAIN, too.

 Perhaps so, but that doesn't take away from Heikki's point: it's still
 ugly.  I don't understand why the sorts can't all be nested under the
 GroupAggregate nodes.  We have a number of nodes already (e.g. Append)
 that support an arbitrary number of children, and I don't see why we
 can't do the same thing here.


I don't think so showing sort and aggregation is bad idea. Both can have a
different performance impacts





 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Andrew Gierth
 Robert == Robert Haas robertmh...@gmail.com writes:

 Robert Sure, showing the sort and aggregation steps is fine.  But I
 Robert don't see what advantage we get out of showing them like
 Robert this:

 Robert Aggregate
 Robert - Sort
 Robert- ChainAggregate
 Robert   - Sort
 Robert  - ChainAggregate
 Robert - Sort

The advantage is that this is how the plan tree is actually
structured.

 Robert When we could show them like this:

 Robert Aggregate
 Robert  - Sort
 Robert  - Sort
 Robert  - Sort

And we can't structure the plan tree like this, because then it
wouldn't be a _tree_ any more.

The Sort node expects to have a child node to fetch rows from, and it
expects all the usual plan tree mechanics (initialization, rescan,
etc.) to work on that child node. There's no way for the parent to
feed data to the child.

 Robert From both a display perspective and an
 Robert implementation-complexity perspective,

... says the person who has never tried implementing it.

Honestly, ChainAggregate is _trivial_ compared to trying to make the
GroupAggregate code deal with multiple inputs, or trying to make some
new sort of plumbing node to feed input to those sorts.  (You'd think
that it should be possible to use the existing CTE mechanics to do it,
but n... the existing code is actively and ferociously hostile to
the idea of adding new CTEs from within the planner.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
 Honestly, ChainAggregate is _trivial_ compared to trying to make the
 GroupAggregate code deal with multiple inputs, or trying to make some
 new sort of plumbing node to feed input to those sorts.  (You'd think
 that it should be possible to use the existing CTE mechanics to do it,
 but n... the existing code is actively and ferociously hostile to
 the idea of adding new CTEs from within the planner.)

 That's unfortunate.

I'm less than convinced that it's true ... I've been meaning to find
time to review this patch, but it sounds like it's getting to the point
where I need to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Robert Haas
On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Robert == Robert Haas robertmh...@gmail.com writes:

  Robert Sure, showing the sort and aggregation steps is fine.  But I
  Robert don't see what advantage we get out of showing them like
  Robert this:

  Robert Aggregate
  Robert - Sort
  Robert- ChainAggregate
  Robert   - Sort
  Robert  - ChainAggregate
  Robert - Sort

 The advantage is that this is how the plan tree is actually
 structured.

I do understand that.  I am questioning (as I believe Heikki was also)
whether it's structured correctly.  Nobody is arguing for displaying
the plan tree in a way that doesn't mirror it's actual structure, or
at least I am not.

 The Sort node expects to have a child node to fetch rows from, and it
 expects all the usual plan tree mechanics (initialization, rescan,
 etc.) to work on that child node. There's no way for the parent to
 feed data to the child.

OK, good point.  So we do need something that can feed data from one
part of the plan tree to another, like a CTE does.  I still think it
would be worth trying to see if there's a reasonable way to structure
the plan tree so that it's flatter.

  Robert From both a display perspective and an
  Robert implementation-complexity perspective,

 ... says the person who has never tried implementing it.

This comment to me reads rather sharply, and I don't feel that the
tone of my email is such as to merit a rebuke.

 Honestly, ChainAggregate is _trivial_ compared to trying to make the
 GroupAggregate code deal with multiple inputs, or trying to make some
 new sort of plumbing node to feed input to those sorts.  (You'd think
 that it should be possible to use the existing CTE mechanics to do it,
 but n... the existing code is actively and ferociously hostile to
 the idea of adding new CTEs from within the planner.)

That's unfortunate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-09-09 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Honestly, ChainAggregate is _trivial_ compared to trying to make the
  GroupAggregate code deal with multiple inputs, or trying to make some
  new sort of plumbing node to feed input to those sorts.  (You'd think
  that it should be possible to use the existing CTE mechanics to do it,
  but n... the existing code is actively and ferociously hostile to
  the idea of adding new CTEs from within the planner.)

  That's unfortunate.

 Tom I'm less than convinced that it's true ...

Maybe you can figure out how, but I certainly didn't see a reasonable way.

I would also question one aspect of the desirability - using the CTE
mechanism has the downside of needing an extra tuplestore with the
full input data set in it, whereas the chain mechanism only has
aggregated data in its tuplestore which should be much smaller.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-27 Thread Robert Haas
On Mon, Aug 25, 2014 at 1:35 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 If you look at the latest patch post, there's a small patch in it that
 does nothing but unreserve the keywords and fix ruleutils to make
 deparse/parse work. The required fix to ruleutils is an example of
 violating your four kinds of keywords principle, but quoting
 keywords still works.

I think it would be intolerable to lose the ability to quote keywords.
That could easily create situations where there's no reasonable way to
dump an older database in such a fashion that it can be reloaded into
a newer database.  So it's good that you avoided that.

The four kinds of keywords principle is obviously much less
absolute.  We've talked before about introducing additional categories
of keywords, and that might be a good thing to do for one reason or
another.  But I think it's not good to do it in a highly idiosyncratic
way: I previously proposed reserving concurrently only when it follows
CREATE INDEX, and not in any other context, but Tom argued that it had
to become a type_func_name_keyword since users would be confused to
find that concurrently (but not any other keyword) needed quoting
there.  In retrospect, I tend to think he probably had it right.
There is a good amount of third-party software out there that tries to
be smart about quoting PostgreSQL keywords - for example, pgAdmin has
code for that, or did last I looked - so by making things more
complicated, we run the risk not only of bugs in our own software but
also bugs in other people's software, as well as user confusion.  So I
still think the right solution is probably to reserve CUBE across the
board, and not just in the narrowest context that we can get away
with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-24 Thread Andrew Gierth
 Robert == Robert Haas robertmh...@gmail.com writes:

 Robert I can accept ugly code, but I feel strongly that we shouldn't
 Robert accept ugly semantics.  Forcing cube to get out of the way
 Robert may not be pretty, but I think it will be much worse if we
 Robert violate the rule that quoting a keyword strips it of its
 Robert special meaning; or the rule that there are four kinds of
 Robert keywords and, if a keyword of a particular class is accepted
 Robert as an identifier in a given context, all other keywords in
 Robert that class will also be accepted as identifiers in that
 Robert context.  Violating those rules could have not-fun-at-all
 Robert consequences like needing to pass additional context
 Robert information to ruleutils.c's quote_identifier() function, or
 Robert not being able to dump and restore a query from an older
 Robert version even with --quote-all-identifiers.  Renaming the cube
 Robert type will suck for people who are using it; but it will only
 Robert have to be done once; weird stuff like the above will be with
 Robert us forever.

If you look at the latest patch post, there's a small patch in it that
does nothing but unreserve the keywords and fix ruleutils to make
deparse/parse work. The required fix to ruleutils is an example of
violating your four kinds of keywords principle, but quoting
keywords still works.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Perhaps so.  I would really prefer not to have to get into
 Tom estimating how many people will be inconvenienced how badly.
 Tom It's clear to me that not a lot of sweat has been put into
 Tom seeing if we can avoid reserving the keyword, and I think we
 Tom need to put in that effort.

So, first nontrivial issue that crops up is this: if CUBE is
unreserved, then ruleutils will output the string cube(a,b) for a
function call to a function named cube, on the assumption that it
will parse back as a single unit (which inside a GROUP BY is not
true).

Options:

1) when outputting GROUP BY clauses (and nothing else), put parens
around anything that isn't provably atomic; or put parens around
anything that might look like a function call; or put parens around
anything that looks like a function call with a keyword name

2) when outputting any function call, add parens if the name is an
unreserved keyword

3) when outputting any function call, quote the name if it is an
unreserved keyword

4) something else?

(This of course means that if someone has a cube() function call in
a group by clause of a view, then upgrading will change the meaning
of the view and possibly fail to create it; there seems to be no fix
for this, not even using the latest pg_dump, since pg_dump relies on
the old server's ruleutils)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Stephen Frost
* Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
 Having now spent some more time looking, I believe there is a solution
 which makes it unreserved which does not require any significant pain
 in the code.  I'm not entirely convinced that this is the right
 approach in the long term, but it might allow for a more planned
 transition.
 
 The absolute minimum seems to be:
 
  GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select
  list as a grouping operation looks like a function call for any
  argument types)
 
  CUBE, ROLLUP, SETS as unreserved_keyword

This means

GROUP BY cube(x,y)
GROUP BY (cube(x,y))
GROUP BY cube(x)

all end up with different meanings though, right?

I'm not sure that's really a better situation.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Alvaro Herrera
Andrew Gierth wrote:

 (This of course means that if someone has a cube() function call in
 a group by clause of a view, then upgrading will change the meaning
 of the view and possibly fail to create it; there seems to be no fix
 for this, not even using the latest pg_dump, since pg_dump relies on
 the old server's ruleutils)

This sucks.  Can we tweak pg_dump to check for presence of the cube
extension, and if found refuse to dump unless a minor version older than
some hardcoded version (known to have fixed ruleutils) is used?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Andrew Gierth
 Alvaro == Alvaro Herrera alvhe...@2ndquadrant.com writes:

  (This of course means that if someone has a cube() function call
  in a group by clause of a view, then upgrading will change the
  meaning of the view and possibly fail to create it; there seems to
  be no fix for this, not even using the latest pg_dump, since
  pg_dump relies on the old server's ruleutils)

 Alvaro This sucks.  Can we tweak pg_dump to check for presence of
 Alvaro the cube extension, and if found refuse to dump unless a
 Alvaro minor version older than some hardcoded version (known to
 Alvaro have fixed ruleutils) is used?

I honestly don't think it's worth it. cube() is not a function that
really makes any sense in a GROUP BY, though of course someone could
have written their own function called cube() that does something
else; while this case is a problem, it is also likely to be
vanishingly rare.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Robert Haas
On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom I wonder if you've tried hard enough to avoid reserving the keyword.

 GROUP BY cube(a,b)  is currently legal syntax and means something completely
 incompatible to what the spec requires.

 Well, if there are any extant applications that use that exact phrasing,
 they're going to be broken in any case.  That does not mean that we have
 to break every other appearance of cube.  I think that special-casing
 appearances of cube(...) in GROUP BY lists might be a feasible approach.

Not really.  As pointed out downthread, you can't distinguish cube
from CUBE.  We could fix that with a big enough hammer, of course, but
it would be a mighty big hammer.

More generally, I think it makes a lot of sense to work harder to
reserve keywords less when there's no fundamental semantic conflict,
but when there is, trying to do things like special case stuff
depending on context results in a situation where some keywords are a
little more reserved than others.  As you pointed out in discussions
of CREATE INDEX CONCURRENTLY, that's confusing:

http://www.postgresql.org/message-id/10769.1261775...@sss.pgh.pa.us
(refer second paragraph)

I think we should:

(1) Rename the cube extension.  With a bat.  I have yet to encounter a
single user who is using it, but there probably are some.  They'll
have to get over it; GROUPING SETS is roughly a hundred times more
important than the cube extension.  The most I'd do to cater to
existing users of the extension is provide an SQL script someplace
that renames the extension and all of its containing objects so that
you can do that before running pg_dump/pg_upgrade.

(2) Reserve CUBE to the extent necessary to implement this feature.
Some people won't like this, but that's always true when we reserve a
keyword, and I don't think refusing to implement an SQL-standard
feature for which there is considerable demand is the right way to fix
that.  Here are the last five keywords we partially or fully reserved:
LATERAL (fully reserved), COLLATION (type_func_name), XMLEXISTS
(col_name), BETWEEN (was type_func_name, became col_name),
CONCURRENTLY (type_func_name).  That takes us back to December 2009,
so the rate at which we do this is just under one a year, and there
haven't been many screams about it.  I grant you that cube is a
slightly more plausible identifier than any of those, but I don't
think we should let the fact that we happen to have an extension with
that name prejudice us too much about how common it really is.

Mind you, I'm not trying to say that we don't need to be judicious in
reserving keywords, or even adding them at all: I've argued against
those things on numerous occasions, and have done work to let us get
rid of keywords we've previously had.  I just think that this is a big
enough, important enough feature that we'll please more people than we
disappoint.  And I think trying to walk some middle way where we
distinguish on context is going to be a mess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, if there are any extant applications that use that exact phrasing,
 they're going to be broken in any case.  That does not mean that we have
 to break every other appearance of cube.  I think that special-casing
 appearances of cube(...) in GROUP BY lists might be a feasible approach.

 Not really.  As pointed out downthread, you can't distinguish cube
 from CUBE.  We could fix that with a big enough hammer, of course, but
 it would be a mighty big hammer.

I'm not convinced of that; I think some creative hackery in the grammar
might be able to deal with this.  It would be a bit ugly, for sure, but
if it works it would be a localized fix.  Meanwhile, I don't believe
that it's going to be possible to rename the cube extension in any way
that's even remotely acceptable for its users (remotely acceptable
here means pg_upgrade works, never mind what's going to be needed
to fix their applications).  So the proposal you are pushing is going
to result in seriously teeing off some fraction of our userbase;
and the argument why that would be acceptable seems to boil down to
I think there are few enough of them that we don't have to care
(an opinion based on little evidence IMO).  I think it's worth investing
some work, and perhaps accepting some ugly code, to try to avoid that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Greg Stark
On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So the proposal you are pushing is going
 to result in seriously teeing off some fraction of our userbase;
 and the argument why that would be acceptable seems to boil down to
 I think there are few enough of them that we don't have to care
 (an opinion based on little evidence IMO

FWIW here's some evidence... Craig Kersteins did a talk on the
statistics across the Heroku fleet: Here are the slides from 2013
though I think there's an updated slide deck with more recent numbers
out there:
https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use

Cube shows up as the number 9 most popular extension with about 1% of
databases having it installed (tied with pg_crypto and earthdistance).
That's a lot more than I would have expected actually.

Personally I would love to change the name because I always found the
name the most confusing thing about it. It took me forever to figure
out what on earth a cube was. It's actually a vector data type which
is actually a pretty useful idea.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Robert Haas
On Fri, Aug 22, 2014 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, if there are any extant applications that use that exact phrasing,
 they're going to be broken in any case.  That does not mean that we have
 to break every other appearance of cube.  I think that special-casing
 appearances of cube(...) in GROUP BY lists might be a feasible approach.

 Not really.  As pointed out downthread, you can't distinguish cube
 from CUBE.  We could fix that with a big enough hammer, of course, but
 it would be a mighty big hammer.

 I'm not convinced of that; I think some creative hackery in the grammar
 might be able to deal with this.  It would be a bit ugly, for sure, but
 if it works it would be a localized fix.

Well, I have no idea how to do that.  I think the only way you'd be
able to is if you make productions like ColId and ColLabel return
something different for a keyword than they do for an IDENT.  And
that's not going to be a localized change.  If you've got another
proposal, I'm all ears...

 Meanwhile, I don't believe
 that it's going to be possible to rename the cube extension in any way
 that's even remotely acceptable for its users (remotely acceptable
 here means pg_upgrade works, never mind what's going to be needed
 to fix their applications).

 So the proposal you are pushing is going
 to result in seriously teeing off some fraction of our userbase;
 and the argument why that would be acceptable seems to boil down to
 I think there are few enough of them that we don't have to care
 (an opinion based on little evidence IMO).

The only hard statistics I am aware of are from Heroku.  Peter
Geoghegan was kind enough to find me the link:

https://www.youtube.com/watch?v=MT2gzzbyWpw

At around 8 minutes, he shows utilization statistics for cube at
around 1% across their install base.  That's higher than I would have
guessed, so, eh, shows what I know.  In any case, I'm not so much
advocating not caring at all as confining the level of caring to what
can be done without moving the earth.

 I think it's worth investing
 some work, and perhaps accepting some ugly code, to try to avoid that.

I can accept ugly code, but I feel strongly that we shouldn't accept
ugly semantics.  Forcing cube to get out of the way may not be pretty,
 but I think it will be much worse if we violate the rule that quoting
a keyword strips it of its special meaning; or the rule that there are
four kinds of keywords and, if a keyword of a particular class is
accepted as an identifier in a given context, all other keywords in
that class will also be accepted as identifiers in that context.
Violating those rules could have not-fun-at-all consequences like
needing to pass additional context information to ruleutils.c's
quote_identifier() function, or not being able to dump and restore a
query from an older version even with --quote-all-identifiers.
Renaming the cube type will suck for people who are using it; but it
will only have to be done once; weird stuff like the above will be
with us forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Andrew Dunstan


On 08/22/2014 02:42 PM, Greg Stark wrote:

On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:

So the proposal you are pushing is going
to result in seriously teeing off some fraction of our userbase;
and the argument why that would be acceptable seems to boil down to
I think there are few enough of them that we don't have to care
(an opinion based on little evidence IMO

FWIW here's some evidence... Craig Kersteins did a talk on the
statistics across the Heroku fleet: Here are the slides from 2013
though I think there's an updated slide deck with more recent numbers
out there:
https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use

Cube shows up as the number 9 most popular extension with about 1% of
databases having it installed (tied with pg_crypto and earthdistance).
That's a lot more than I would have expected actually.



That's an interesting statistic. What I'd be more interested in is 
finding out how many of those are actually using it as opposed to having 
loaded it into a database.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Merlin Moncure
On Fri, Aug 22, 2014 at 1:52 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Aug 22, 2014 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 https://www.youtube.com/watch?v=MT2gzzbyWpw

 At around 8 minutes, he shows utilization statistics for cube at
 around 1% across their install base.  That's higher than I would have
 guessed, so, eh, shows what I know.  In any case, I'm not so much
 advocating not caring at all as confining the level of caring to what
 can be done without moving the earth.

cube is a dependency for earthdistance and it's gotten some light
advocacy throughout the years as the 'way to do it'.  I tried it
myself way back in the day and concluded a while back that the 'box'
type + gist was better than earthdistance for bounding box operations
-- it just seemed easier to understand and use.  If you search the
archives you'll probably find a couple of examples of me suggesting as
such.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
 
 On 08/22/2014 02:42 PM, Greg Stark wrote:
 On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So the proposal you are pushing is going
 to result in seriously teeing off some fraction of our userbase;
 and the argument why that would be acceptable seems to boil down to
 I think there are few enough of them that we don't have to care
 (an opinion based on little evidence IMO
 FWIW here's some evidence... Craig Kersteins did a talk on the
 statistics across the Heroku fleet: Here are the slides from 2013
 though I think there's an updated slide deck with more recent numbers
 out there:
 https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use
 
 Cube shows up as the number 9 most popular extension with about 1% of
 databases having it installed (tied with pg_crypto and earthdistance).
 That's a lot more than I would have expected actually.
 
 
 That's an interesting statistic. What I'd be more interested in is
 finding out how many of those are actually using it as opposed to
 having loaded it into a database.

Agreed- and how many of those have *every extension available* loaded...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Greg Stark
On Fri, Aug 22, 2014 at 10:37 PM, Stephen Frost sfr...@snowman.net wrote:
 Agreed- and how many of those have *every extension available* loaded...

Actually that was also in the talk.a few slides later. 0.7%

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Stephen Frost
* Greg Stark (st...@mit.edu) wrote:
 On Fri, Aug 22, 2014 at 10:37 PM, Stephen Frost sfr...@snowman.net wrote:
  Agreed- and how many of those have *every extension available* loaded...
 
 Actually that was also in the talk.a few slides later. 0.7%

So, 0.3% install cube w/o installing *every* extension..?  That seems
like the more relevant number then, to me anyway.

Admittedly, it's non-zero, but it's also a rather small percentage..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-22 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom I'm not convinced of that; I think some creative hackery in the
 Tom grammar might be able to deal with this.

Making GROUP BY CUBE(a,b) parse as grouping sets rather than as a
function turned out to be the easy part: give CUBE a lower precedence
than '(' (equal to the one for IDENT and various other unreserved
keywords), and a rule that has an explicit CUBE '(' gets preferred
over one that reduces the CUBE to an unreserved_keyword.

The (relatively minor) ugliness required is mostly in the ruleutils
logic to decide how to output a cube(...) function call in such a way
that it doesn't get misparsed as a grouping set. See my other mail on
that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Heikki Linnakangas

On 08/13/2014 09:43 PM, Atri Sharma wrote:

Sorry, forgot to attach the patch for fixing cube in contrib, which breaks
since we now reserve cube keyword. Please find attached the same.


Ugh, that will make everyone using the cube extension unhappy. After 
this patch, they will have to quote contrib's cube type and functions 
every time.


I think we should bite the bullet and rename the extension, and its 
cube type and functions. For an application, having to suddenly quote 
it has the same effect as renaming it; you'll have to find all the 
callers and change them. And in the long-run, it's clearly better to 
have an unambiguous name.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Andrew Gierth
 Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:

  On 08/13/2014 09:43 PM, Atri Sharma wrote:
  Sorry, forgot to attach the patch for fixing cube in contrib,
  which breaks since we now reserve cube keyword. Please find
  attached the same.

 Heikki Ugh, that will make everyone using the cube extension
 Heikki unhappy. After this patch, they will have to quote contrib's
 Heikki cube type and functions every time.

 Heikki I think we should bite the bullet and rename the extension,

I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first.  (And it
is posted separately from the main patch because of its length and
utter boringness.)

However, even if/when a new name is chosen, there's the question of
how to make the upgrade path easiest. Once CUBE is reserved,
up-to-date pg_dump will quote all uses of the cube type and function
when dumping an older database (except inside function bodies of
course), so there may be merit in keeping a cube domain over the new
type, and maybe also merit in keeping the extension name.

So what's the new type name going to be? cuboid? hypercube?
geometric_cube?  n_dimensional_box?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Andrew Gierth

A progress update:

 Atri  We envisage that handling of arbitrary grouping sets will be
 Atri best done by having the planner generating an Append of
 Atri multiple aggregation paths, presumably with some way of moving
 Atri the original input path to a CTE. We have not really explored
 Atri yet how hard this will be; suggestions are welcome.

This idea was abandoned.

Instead, we have implemented full support for arbitrary grouping sets
by means of a chaining system:

explain (verbose, costs off) select four, ten, hundred, count(*) from onek 
group by cube(four,ten,hundred);

 QUERY PLAN 
 
-
 GroupAggregate
   Output: four, ten, hundred, count(*)
   Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, 
onek.four), (onek.hundred), ()
   -  Sort
 Output: four, ten, hundred
 Sort Key: onek.hundred, onek.four, onek.ten
 -  ChainAggregate
   Output: four, ten, hundred
   Grouping Sets: (onek.ten, onek.hundred), (onek.ten)
   -  Sort
 Output: four, ten, hundred
 Sort Key: onek.ten, onek.hundred
 -  ChainAggregate
   Output: four, ten, hundred
   Grouping Sets: (onek.four, onek.ten), (onek.four)
   -  Sort
 Output: four, ten, hundred
 Sort Key: onek.four, onek.ten
 -  Seq Scan on public.onek
   Output: four, ten, hundred
(20 rows)

The ChainAggregate nodes use a tuplestore to communicate with the
GroupAggregate node at the top of the chain; they pass through input
tuples unchanged, and write aggregated result rows to the tuplestore,
which the top node then returns once it has finished its own result.

The organization of the planner code seems to be actively hostile to
any attempt to break out new CTEs on the fly, or to plan parts of the
query more than once; the method above seems to be the easiest way to
avoid those issues.

 Atri At this point we are more interested in design review rather
 Atri than necessarily committing this patch in its current state.

This no longer applies; we expect to post within a day or two an
updated patch with full functionality.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Heikki Linnakangas

On 08/21/2014 01:28 PM, Andrew Gierth wrote:


A progress update:

  Atri  We envisage that handling of arbitrary grouping sets will be
  Atri best done by having the planner generating an Append of
  Atri multiple aggregation paths, presumably with some way of moving
  Atri the original input path to a CTE. We have not really explored
  Atri yet how hard this will be; suggestions are welcome.

This idea was abandoned.

Instead, we have implemented full support for arbitrary grouping sets
by means of a chaining system:

explain (verbose, costs off) select four, ten, hundred, count(*) from onek 
group by cube(four,ten,hundred);

  QUERY PLAN
-
  GroupAggregate
Output: four, ten, hundred, count(*)
Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, 
onek.four), (onek.hundred), ()
-  Sort
  Output: four, ten, hundred
  Sort Key: onek.hundred, onek.four, onek.ten
  -  ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.ten, onek.hundred), (onek.ten)
-  Sort
  Output: four, ten, hundred
  Sort Key: onek.ten, onek.hundred
  -  ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.four, onek.ten), (onek.four)
-  Sort
  Output: four, ten, hundred
  Sort Key: onek.four, onek.ten
  -  Seq Scan on public.onek
Output: four, ten, hundred
(20 rows)


Uh, that's ugly. The EXPLAIN out I mean; as an implementation detail 
chaining the nodes might be reasonable. But the above gets unreadable if 
you have more than a few grouping sets.



The ChainAggregate nodes use a tuplestore to communicate with the
GroupAggregate node at the top of the chain; they pass through input
tuples unchanged, and write aggregated result rows to the tuplestore,
which the top node then returns once it has finished its own result.


Hmm, so there's a magic link between the GroupAggregate at the top and 
all the ChainAggregates, via the tuplestore. That may be fine, we have 
special rules in passing information between bitmap scan nodes too.


But rather than chain multiple ChainAggregate nodes, how about just 
doing all the work in the top GroupAggregate node?



  Atri At this point we are more interested in design review rather
  Atri than necessarily committing this patch in its current state.

This no longer applies; we expect to post within a day or two an
updated patch with full functionality.


Ok, cool

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:
  Heikki I think we should bite the bullet and rename the extension,

 I agree, the contrib/cube patch as posted is purely so we could test
 everything without having to argue over the new name first.

I wonder if you've tried hard enough to avoid reserving the keyword.

I think that the cube extension is not going to be the only casualty
if cube becomes a reserved word --- that seems like a name that
could be in use in lots of applications.  (What do you mean, 9.5
breaks our database for tracking office space?)  It would be worth
quite a bit of effort to avoid that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Andrew Gierth
 Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:

 Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation
 Heikki detail chaining the nodes might be reasonable. But the above
 Heikki gets unreadable if you have more than a few grouping sets.

It's good for highlighting performance issues in EXPLAIN, too.

4096 grouping sets takes about a third of a second to plan and execute,
but something like a minute to generate the EXPLAIN output. However,
for more realistic sizes, plan time is not significant and explain
takes only about 40ms for 256 grouping sets.

(To avoid resource exhaustion issues, we have set a limit of,
currently, 4096 grouping sets per query level. Without such a limit,
it is easy to write queries that would take TBs of memory to parse or
plan. MSSQL and DB2 have similar limits, I'm told.)

  The ChainAggregate nodes use a tuplestore to communicate with the
  GroupAggregate node at the top of the chain; they pass through input
  tuples unchanged, and write aggregated result rows to the tuplestore,
  which the top node then returns once it has finished its own result.

 Heikki Hmm, so there's a magic link between the GroupAggregate at
 Heikki the top and all the ChainAggregates, via the tuplestore. That
 Heikki may be fine, we have special rules in passing information
 Heikki between bitmap scan nodes too.

Eh. It's far from a perfect solution, but the planner doesn't lend itself
to perfect solutions.

 Heikki But rather than chain multiple ChainAggregate nodes, how
 Heikki about just doing all the work in the top GroupAggregate node?

It was easier this way. (How would you expect to do it all in the top
node when each subset of the grouping sets list needs to see the data
in a different order?)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Pavel Stehule
2014-08-21 17:00 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Andrew Gierth and...@tao11.riddles.org.uk writes:
  Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes:
   Heikki I think we should bite the bullet and rename the extension,

  I agree, the contrib/cube patch as posted is purely so we could test
  everything without having to argue over the new name first.

 I wonder if you've tried hard enough to avoid reserving the keyword.

 I think that the cube extension is not going to be the only casualty
 if cube becomes a reserved word --- that seems like a name that
 could be in use in lots of applications.  (What do you mean, 9.5
 breaks our database for tracking office space?)  It would be worth
 quite a bit of effort to avoid that.


My prototypes worked without reserved keywords if I remember well

but analyzer is relatively complex

Pavel





 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  I agree, the contrib/cube patch as posted is purely so we could test
  everything without having to argue over the new name first.

 Tom I wonder if you've tried hard enough to avoid reserving the keyword.

GROUP BY cube(a,b)  is currently legal syntax and means something completely
incompatible to what the spec requires.

GROUP BY GROUPING SETS (cube(a,b), c)  -- is that cube(a,b) an expression
to group on, or a list of grouping sets to expand?

GROUP BY (cube(a,b)) -- should that be an error, or silently treat it
as a function call rather than a grouping set? What about GROUP BY
GROUPING SETS ((cube(a,b)) ? (both are errors in our patch)

Accepting those as valid implies a degree of possible confusion that I
personally regard as quite questionable.  Previous discussion seemed to
have accepted that contrib/cube was going to have to be renamed.

 Tom I think that the cube extension is not going to be the only
 Tom casualty if cube becomes a reserved word --- that seems like a
 Tom name that could be in use in lots of applications.  (What do
 Tom you mean, 9.5 breaks our database for tracking office space?)
 Tom It would be worth quite a bit of effort to avoid that.

It has been a reserved word in the spec since, what, 1999? and it is a
reserved word in mssql, oracle, db2, etc.?

It only needs to be a col_name_keyword, so it still works as a table
or column name (as usual we are less strict than the spec in that
respect).  I'm looking into whether it can be made unreserved, but I
have serious doubts about this being a good idea.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Pavel Stehule
2014-08-21 17:58 GMT+02:00 Andrew Gierth and...@tao11.riddles.org.uk:

  Tom == Tom Lane t...@sss.pgh.pa.us writes:

   I agree, the contrib/cube patch as posted is purely so we could test
   everything without having to argue over the new name first.

  Tom I wonder if you've tried hard enough to avoid reserving the keyword.

 GROUP BY cube(a,b)  is currently legal syntax and means something
 completely
 incompatible to what the spec requires.

 GROUP BY GROUPING SETS (cube(a,b), c)  -- is that cube(a,b) an expression
 to group on, or a list of grouping sets to expand?

 GROUP BY (cube(a,b)) -- should that be an error, or silently treat it
 as a function call rather than a grouping set? What about GROUP BY
 GROUPING SETS ((cube(a,b)) ? (both are errors in our patch)

 Accepting those as valid implies a degree of possible confusion that I
 personally regard as quite questionable.  Previous discussion seemed to
 have accepted that contrib/cube was going to have to be renamed.

  Tom I think that the cube extension is not going to be the only
  Tom casualty if cube becomes a reserved word --- that seems like a
  Tom name that could be in use in lots of applications.  (What do
  Tom you mean, 9.5 breaks our database for tracking office space?)
  Tom It would be worth quite a bit of effort to avoid that.

 It has been a reserved word in the spec since, what, 1999? and it is a
 reserved word in mssql, oracle, db2, etc.?

 It only needs to be a col_name_keyword, so it still works as a table
 or column name (as usual we are less strict than the spec in that
 respect).  I'm looking into whether it can be made unreserved, but I
 have serious doubts about this being a good idea.


+1

contrib module should be renamed - more - current name is confusing against
usual functionality related to words CUBE and ROLLUP

Pavel



 --
 Andrew (irc:RhodiumToad)


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom I wonder if you've tried hard enough to avoid reserving the keyword.

 GROUP BY cube(a,b)  is currently legal syntax and means something completely
 incompatible to what the spec requires.

Well, if there are any extant applications that use that exact phrasing,
they're going to be broken in any case.  That does not mean that we have
to break every other appearance of cube.  I think that special-casing
appearances of cube(...) in GROUP BY lists might be a feasible approach.

Basically, I'm afraid that unilaterally renaming cube is going to break
enough applications that there will be more people who flat out don't
want this patch than there will be who get benefit from it, and we end
up voting to revert the feature altogether.  If you'd like to take that
risk then feel free to charge full steam ahead, but don't say you were
not warned.  And don't bother arguing that CUBE is reserved according to
the standard, because that will not make one damn bit of difference
to the people who will be unhappy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Merlin Moncure
On Thu, Aug 21, 2014 at 1:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom I wonder if you've tried hard enough to avoid reserving the keyword.

 GROUP BY cube(a,b)  is currently legal syntax and means something completely
 incompatible to what the spec requires.

 Well, if there are any extant applications that use that exact phrasing,
 they're going to be broken in any case.  That does not mean that we have
 to break every other appearance of cube.  I think that special-casing
 appearances of cube(...) in GROUP BY lists might be a feasible approach.

 Basically, I'm afraid that unilaterally renaming cube is going to break
 enough applications that there will be more people who flat out don't
 want this patch than there will be who get benefit from it, and we end
 up voting to revert the feature altogether.  If you'd like to take that
 risk then feel free to charge full steam ahead, but don't say you were
 not warned.  And don't bother arguing that CUBE is reserved according to
 the standard, because that will not make one damn bit of difference
 to the people who will be unhappy.

I have to respectfully disagree.  Certainly, if there is some
reasonable way to not have to change 'cube' then great.  But the
tonnage rule applies here: even considering compatibility issues, when
considering the importance of standard SQL (and, I might add,
exceptionally useful) syntax and a niche extension, 'cube' is going to
have to get out of the way.  There are view valid reasons to break
compatibility but blocking standard syntax is definitely one of them.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Andrew Dunstan


On 08/21/2014 02:48 PM, Merlin Moncure wrote:


Basically, I'm afraid that unilaterally renaming cube is going to break
enough applications that there will be more people who flat out don't
want this patch than there will be who get benefit from it, and we end
up voting to revert the feature altogether.  If you'd like to take that
risk then feel free to charge full steam ahead, but don't say you were
not warned.  And don't bother arguing that CUBE is reserved according to
the standard, because that will not make one damn bit of difference
to the people who will be unhappy.

I have to respectfully disagree.  Certainly, if there is some
reasonable way to not have to change 'cube' then great.  But the
tonnage rule applies here: even considering compatibility issues, when
considering the importance of standard SQL (and, I might add,
exceptionally useful) syntax and a niche extension, 'cube' is going to
have to get out of the way.  There are view valid reasons to break
compatibility but blocking standard syntax is definitely one of them.




I'm inclined to think that the audience for this is far larger than the 
audience for the cube extension, which I have not once encountered in 
the field.


But I guess we all have different experiences.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I'm inclined to think that the audience for this is far larger than the 
 audience for the cube extension, which I have not once encountered in 
 the field.

Perhaps so.  I would really prefer not to have to get into estimating
how many people will be inconvenienced how badly.  It's clear to me
that not a lot of sweat has been put into seeing if we can avoid
reserving the keyword, and I think we need to put in that effort.
We've jumped through some pretty high hoops to avoid reserving keywords in
the past, so I don't think this patch should get a free pass on the issue.

Especially considering that renaming the cube extension isn't exactly
going to be zero work: there is no infrastructure for such a thing.
A patch consisting merely of s/cube/foobar/g isn't going to cut it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Andrew Dunstan and...@dunslane.net writes:
  I'm inclined to think that the audience for this is far larger than the 
  audience for the cube extension, which I have not once encountered in 
  the field.

+1

 Perhaps so.  I would really prefer not to have to get into estimating
 how many people will be inconvenienced how badly.  It's clear to me
 that not a lot of sweat has been put into seeing if we can avoid
 reserving the keyword, and I think we need to put in that effort.

I'm with Merlin on this one, it's going to end up happening and I don't
know that 9.5 is any worse than post-9.5 to make this change.

 We've jumped through some pretty high hoops to avoid reserving keywords in
 the past, so I don't think this patch should get a free pass on the issue.

This doesn't feel like an attempt to get a free pass on anything- it's
not being proposed as fully reserved and there is spec-defined syntax
which needs to be supported.  If we can get away with keeping it
unreserved while not making it utterly confusing for users and
convoluting the code, great, but that doesn't seem likely to pan out.

 Especially considering that renaming the cube extension isn't exactly
 going to be zero work: there is no infrastructure for such a thing.
 A patch consisting merely of s/cube/foobar/g isn't going to cut it.

This is a much more interesting challenge to deal with, but perhaps we
could include a perl script or pg_upgrade snippet for users to run to
see if they have the extension installed and to do some magic before the
actual upgrade to handle the rename..?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread David Fetter
On Thu, Aug 21, 2014 at 06:15:33PM -0400, Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Andrew Dunstan and...@dunslane.net writes:
   I'm inclined to think that the audience for this is far larger than the 
   audience for the cube extension, which I have not once encountered in 
   the field.
 
 +1

I haven't seen it in the field either.

I'd also like to mention that the mere presence of a module in our
contrib/ directory can reflect bad decisions that need reversing just
as easily as it can the presence of vitally important utilities that
need to be preserved.  I'm pretty sure the cube extension arrived
after the CUBE keyword in SQL, which makes that an error on our part
if true.

  Especially considering that renaming the cube extension isn't
  exactly going to be zero work: there is no infrastructure for such
  a thing.  A patch consisting merely of s/cube/foobar/g isn't going
  to cut it.
 
 This is a much more interesting challenge to deal with, but perhaps
 we could include a perl script or pg_upgrade snippet for users to
 run to see if they have the extension installed and to do some magic
 before the actual upgrade to handle the rename..?

+1 for doing this.  Do we want to make some kind of generator for such
things?  It doesn't seem hard in principle, but I haven't tried coding
it up yet.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2014-08-21 Thread Andrew Gierth
 Stephen == Stephen Frost sfr...@snowman.net writes:

  I'm inclined to think that the audience for this is far larger
  than the audience for the cube extension, which I have not once
  encountered in the field.

 Stephen +1

Most of my encounters with cube have been me suggesting it to people
on IRC as a possible approach for solving certain kinds of performance
problems by converting them to N-dimensional spatial containment
queries. Sometimes this works well, but it doesn't seem to be an
approach that many people discover on their own.

  We've jumped through some pretty high hoops to avoid reserving
  keywords in the past, so I don't think this patch should get a
  free pass on the issue.

 Stephen This doesn't feel like an attempt to get a free pass on
 Stephen anything- it's not being proposed as fully reserved and
 Stephen there is spec-defined syntax which needs to be supported.
 Stephen If we can get away with keeping it unreserved while not
 Stephen making it utterly confusing for users and convoluting the
 Stephen code, great, but that doesn't seem likely to pan out.

Having now spent some more time looking, I believe there is a solution
which makes it unreserved which does not require any significant pain
in the code.  I'm not entirely convinced that this is the right
approach in the long term, but it might allow for a more planned
transition.

The absolute minimum seems to be:

 GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select
 list as a grouping operation looks like a function call for any
 argument types)

 CUBE, ROLLUP, SETS as unreserved_keyword

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers