Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Svenne == Svenne Krap sve...@krap.dk writes: Svenne I have the explains, Can you post the explain analyze outputs? If need be, you can anonymize the table and column names and any identifiers by using the anonymization option of explain.depesz.com, but please only do that if you actually need to. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Oh, and I build it on top of f92fc4c95ddcc25978354a8248d3df22269201bc On 20-04-2015 10:36, Svenne Krap wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed Hi, I have (finally) found time to review this. The syntax is as per spec as I can see, and the queries I have tested have all produced the correct output. The documentation looks good and is clear. I think it is spec compliant, but I am not used enough to the spec to be sure. Also I have not understood the function of set quantifier (DISTINCT,ALL) part in the group by clause (and hence not tested it). Hence I haven't marked the spec compliant part. The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a sorting problem when looking at the diff) which should be unrelated to GSP. I don't know enough of the check to know if it has already run the GSP tests.. I have also been running a few tests on some real data. This is run on my laptop with 32 GB of memory and a fast SSD. The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a tiny multi-column lookup table. I am returning a count(*). Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and ROLLUP both works fine and if work_buffers are large enough it slightly beats the handwritten union all equivalent (runtimes as 7,6 seconds to 7,7 seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) seconds). The other query is on the same datatable as before, but with three columns (two calculated and one natural) for a cube. I am returning a count(*). First column is extract year from date column Second column is divide a value by something and truncate (i.e. make buckets) Third column is a litteral integer column. Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). Nothing obvious about why in the explain (analyze,buffers,costs,timing) . I have the explains, but as the dataset is semi-private and I don't have any easy way to edit out names in it, I will send it on request (non-disclosure from the recipient is of course a must) and not post it on the list. I think the feature is ready to be commited, but am unsure whether I am qualified to gauge that :) /Svenne The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed Hi, I have (finally) found time to review this. The syntax is as per spec as I can see, and the queries I have tested have all produced the correct output. The documentation looks good and is clear. I think it is spec compliant, but I am not used enough to the spec to be sure. Also I have not understood the function of set quantifier (DISTINCT,ALL) part in the group by clause (and hence not tested it). Hence I haven't marked the spec compliant part. The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a sorting problem when looking at the diff) which should be unrelated to GSP. I don't know enough of the check to know if it has already run the GSP tests.. I have also been running a few tests on some real data. This is run on my laptop with 32 GB of memory and a fast SSD. The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a tiny multi-column lookup table. I am returning a count(*). Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and ROLLUP both works fine and if work_buffers are large enough it slightly beats the handwritten union all equivalent (runtimes as 7,6 seconds to 7,7 seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) seconds). The other query is on the same datatable as before, but with three columns (two calculated and one natural) for a cube. I am returning a count(*). First column is extract year from date column Second column is divide a value by something and truncate (i.e. make buckets) Third column is a litteral integer column. Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). Nothing obvious about why in the explain (analyze,buffers,costs,timing) . I have the explains, but as the dataset is semi-private and I don't have any easy way to edit out names in it, I will send it on request (non-disclosure from the recipient is of course a must) and not post it on the list. I think the feature is ready to be commited, but am unsure whether I am qualified to gauge that :) /Svenne The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Sun, Apr 12, 2015 at 2:26 PM, 彭瑞华 ruihua.p...@163.com wrote: I am using postgesql 9.4.0. Thanks for your great work on grouping sets patch effort. I am now compiling postgresql from source code 9.4.0 on Linux platform with [gsp-all.patch] successed and grouping function well, but failed on window platform(windows 2003 or window 7). It shows unrecognized keywords: grouping, cube, etc. Good suggestions or tips? thanks a lot! When reviewing or testing a patch targeted for integration in 9.5, you should avoid using a previous major version but you should apply the patch on a version based on one of the latest commits of master branch in the git repository of Postgres. I am guessing that you cannot get the patch compiling because of some code conflicts. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
ok, I will try it using git master branch source code. thanks! Of course, using gsp-all-latest.patch this time. At 2015-04-12 14:23:46, Michael Paquier michael.paqu...@gmail.com wrote: On Sun, Apr 12, 2015 at 2:26 PM, 彭瑞华 ruihua.p...@163.com wrote: I am using postgesql 9.4.0. Thanks for your great work on grouping sets patch effort. I am now compiling postgresql from source code 9.4.0 on Linux platform with [gsp-all.patch] successed and grouping function well, but failed on window platform(windows 2003 or window 7). It shows unrecognized keywords: grouping, cube, etc. Good suggestions or tips? thanks a lot! When reviewing or testing a patch targeted for integration in 9.5, you should avoid using a previous major version but you should apply the patch on a version based on one of the latest commits of master branch in the git repository of Postgres. I am guessing that you cannot get the patch compiling because of some code conflicts. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Dear: I am using postgesql 9.4.0. Thanks for your great work on grouping sets patch effort. I am now compiling postgresql from source code 9.4.0 on Linux platform with [gsp-all.patch] successed and grouping function well, but failed on window platform(windows 2003 or window 7). It shows unrecognized keywords: grouping, cube, etc. Good suggestions or tips? thanks a lot! henry
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 18-03-2015 17:18, Svenne Krap wrote: I still need to check against the standard and I will run it against a non-trivival production load... hopefully I will finish up my review shortly after the weekend... I am still on it, but a little delayed. I hope to get it done this weekend. Svenne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Svenne == Svenne Krap sve...@krap.dk writes: Svenne I still need to check against the standard and I will run it Svenne against a non-trivival production load... hopefully I will Svenne finish up my review shortly after the weekend... Thanks for the review so far; any progress? I'm quite interested in collecting samples of realistic grouping sets queries and their performance, for use in possible further optimization work. (I don't need full data or anything like that, just this query ran in x seconds on N million rows, which is fast enough/not fast enough/too slow to be any use) Let me know if there's anything you need... -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed This is a midway review, a later will complete it. The patch applies against 8d1f239003d0245dda636dfa6cf0add13bee69d6 and builds correctly. Make installcheck-world fails, but it seems to be somewhere totally unrelated (TCL pl)... The documentation is very well-written and the patch implements the documented syntax. I still need to check against the standard and I will run it against a non-trivival production load... hopefully I will finish up my review shortly after the weekend... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Patch from message (87d24iukc5@news-spur.riddles.org.uk) fails (tried to apply on top of ebc0f5e01d2f ), as b55722692 has broken up the line (in src/backend/optimizer/util/pathnode.c): pathnode-path.rows = estimate_num_groups(root, uniq_exprs, rel-rows); After patching the added parameter (NULL) in by hand, the build fails as src/backend/optimizer/path/indxpath.c:1953 misses the new argument as well - this change is not in the patch. /Svenne The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. Perhaps so, but that doesn't take away from Heikki's point: it's still ugly. I don't understand why the sorts can't all be nested under the GroupAggregate nodes. We have a number of nodes already (e.g. Append) that support an arbitrary number of children, and I don't see why we can't do the same thing here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Tue, Sep 9, 2014 at 11:19 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-09-09 16:01 GMT+02:00 Robert Haas robertmh...@gmail.com: On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. Perhaps so, but that doesn't take away from Heikki's point: it's still ugly. I don't understand why the sorts can't all be nested under the GroupAggregate nodes. We have a number of nodes already (e.g. Append) that support an arbitrary number of children, and I don't see why we can't do the same thing here. I don't think so showing sort and aggregation is bad idea. Both can have a different performance impacts Sure, showing the sort and aggregation steps is fine. But I don't see what advantage we get out of showing them like this: Aggregate - Sort - ChainAggregate - Sort - ChainAggregate - Sort When we could show them like this: Aggregate - Sort - Sort - Sort From both a display perspective and an implementation-complexity perspective, it seems appealing to have the Aggregate node feed the data to one sort after another, rather having it send the data down a very deep pipe. I might be missing something, of course. I don't want to presume that I'm smarter than Andrew, because Andrew is pretty smart. :-) But it seems odd to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
2014-09-09 16:01 GMT+02:00 Robert Haas robertmh...@gmail.com: On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. Perhaps so, but that doesn't take away from Heikki's point: it's still ugly. I don't understand why the sorts can't all be nested under the GroupAggregate nodes. We have a number of nodes already (e.g. Append) that support an arbitrary number of children, and I don't see why we can't do the same thing here. I don't think so showing sort and aggregation is bad idea. Both can have a different performance impacts -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Robert == Robert Haas robertmh...@gmail.com writes: Robert Sure, showing the sort and aggregation steps is fine. But I Robert don't see what advantage we get out of showing them like Robert this: Robert Aggregate Robert - Sort Robert- ChainAggregate Robert - Sort Robert - ChainAggregate Robert - Sort The advantage is that this is how the plan tree is actually structured. Robert When we could show them like this: Robert Aggregate Robert - Sort Robert - Sort Robert - Sort And we can't structure the plan tree like this, because then it wouldn't be a _tree_ any more. The Sort node expects to have a child node to fetch rows from, and it expects all the usual plan tree mechanics (initialization, rescan, etc.) to work on that child node. There's no way for the parent to feed data to the child. Robert From both a display perspective and an Robert implementation-complexity perspective, ... says the person who has never tried implementing it. Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Robert Haas robertmh...@gmail.com writes: On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) That's unfortunate. I'm less than convinced that it's true ... I've been meaning to find time to review this patch, but it sounds like it's getting to the point where I need to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Robert == Robert Haas robertmh...@gmail.com writes: Robert Sure, showing the sort and aggregation steps is fine. But I Robert don't see what advantage we get out of showing them like Robert this: Robert Aggregate Robert - Sort Robert- ChainAggregate Robert - Sort Robert - ChainAggregate Robert - Sort The advantage is that this is how the plan tree is actually structured. I do understand that. I am questioning (as I believe Heikki was also) whether it's structured correctly. Nobody is arguing for displaying the plan tree in a way that doesn't mirror it's actual structure, or at least I am not. The Sort node expects to have a child node to fetch rows from, and it expects all the usual plan tree mechanics (initialization, rescan, etc.) to work on that child node. There's no way for the parent to feed data to the child. OK, good point. So we do need something that can feed data from one part of the plan tree to another, like a CTE does. I still think it would be worth trying to see if there's a reasonable way to structure the plan tree so that it's flatter. Robert From both a display perspective and an Robert implementation-complexity perspective, ... says the person who has never tried implementing it. This comment to me reads rather sharply, and I don't feel that the tone of my email is such as to merit a rebuke. Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) That's unfortunate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Tom == Tom Lane t...@sss.pgh.pa.us writes: Honestly, ChainAggregate is _trivial_ compared to trying to make the GroupAggregate code deal with multiple inputs, or trying to make some new sort of plumbing node to feed input to those sorts. (You'd think that it should be possible to use the existing CTE mechanics to do it, but n... the existing code is actively and ferociously hostile to the idea of adding new CTEs from within the planner.) That's unfortunate. Tom I'm less than convinced that it's true ... Maybe you can figure out how, but I certainly didn't see a reasonable way. I would also question one aspect of the desirability - using the CTE mechanism has the downside of needing an extra tuplestore with the full input data set in it, whereas the chain mechanism only has aggregated data in its tuplestore which should be much smaller. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Mon, Aug 25, 2014 at 1:35 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: If you look at the latest patch post, there's a small patch in it that does nothing but unreserve the keywords and fix ruleutils to make deparse/parse work. The required fix to ruleutils is an example of violating your four kinds of keywords principle, but quoting keywords still works. I think it would be intolerable to lose the ability to quote keywords. That could easily create situations where there's no reasonable way to dump an older database in such a fashion that it can be reloaded into a newer database. So it's good that you avoided that. The four kinds of keywords principle is obviously much less absolute. We've talked before about introducing additional categories of keywords, and that might be a good thing to do for one reason or another. But I think it's not good to do it in a highly idiosyncratic way: I previously proposed reserving concurrently only when it follows CREATE INDEX, and not in any other context, but Tom argued that it had to become a type_func_name_keyword since users would be confused to find that concurrently (but not any other keyword) needed quoting there. In retrospect, I tend to think he probably had it right. There is a good amount of third-party software out there that tries to be smart about quoting PostgreSQL keywords - for example, pgAdmin has code for that, or did last I looked - so by making things more complicated, we run the risk not only of bugs in our own software but also bugs in other people's software, as well as user confusion. So I still think the right solution is probably to reserve CUBE across the board, and not just in the narrowest context that we can get away with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Robert == Robert Haas robertmh...@gmail.com writes: Robert I can accept ugly code, but I feel strongly that we shouldn't Robert accept ugly semantics. Forcing cube to get out of the way Robert may not be pretty, but I think it will be much worse if we Robert violate the rule that quoting a keyword strips it of its Robert special meaning; or the rule that there are four kinds of Robert keywords and, if a keyword of a particular class is accepted Robert as an identifier in a given context, all other keywords in Robert that class will also be accepted as identifiers in that Robert context. Violating those rules could have not-fun-at-all Robert consequences like needing to pass additional context Robert information to ruleutils.c's quote_identifier() function, or Robert not being able to dump and restore a query from an older Robert version even with --quote-all-identifiers. Renaming the cube Robert type will suck for people who are using it; but it will only Robert have to be done once; weird stuff like the above will be with Robert us forever. If you look at the latest patch post, there's a small patch in it that does nothing but unreserve the keywords and fix ruleutils to make deparse/parse work. The required fix to ruleutils is an example of violating your four kinds of keywords principle, but quoting keywords still works. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom Perhaps so. I would really prefer not to have to get into Tom estimating how many people will be inconvenienced how badly. Tom It's clear to me that not a lot of sweat has been put into Tom seeing if we can avoid reserving the keyword, and I think we Tom need to put in that effort. So, first nontrivial issue that crops up is this: if CUBE is unreserved, then ruleutils will output the string cube(a,b) for a function call to a function named cube, on the assumption that it will parse back as a single unit (which inside a GROUP BY is not true). Options: 1) when outputting GROUP BY clauses (and nothing else), put parens around anything that isn't provably atomic; or put parens around anything that might look like a function call; or put parens around anything that looks like a function call with a keyword name 2) when outputting any function call, add parens if the name is an unreserved keyword 3) when outputting any function call, quote the name if it is an unreserved keyword 4) something else? (This of course means that if someone has a cube() function call in a group by clause of a view, then upgrading will change the meaning of the view and possibly fail to create it; there seems to be no fix for this, not even using the latest pg_dump, since pg_dump relies on the old server's ruleutils) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
* Andrew Gierth (and...@tao11.riddles.org.uk) wrote: Having now spent some more time looking, I believe there is a solution which makes it unreserved which does not require any significant pain in the code. I'm not entirely convinced that this is the right approach in the long term, but it might allow for a more planned transition. The absolute minimum seems to be: GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select list as a grouping operation looks like a function call for any argument types) CUBE, ROLLUP, SETS as unreserved_keyword This means GROUP BY cube(x,y) GROUP BY (cube(x,y)) GROUP BY cube(x) all end up with different meanings though, right? I'm not sure that's really a better situation. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Andrew Gierth wrote: (This of course means that if someone has a cube() function call in a group by clause of a view, then upgrading will change the meaning of the view and possibly fail to create it; there seems to be no fix for this, not even using the latest pg_dump, since pg_dump relies on the old server's ruleutils) This sucks. Can we tweak pg_dump to check for presence of the cube extension, and if found refuse to dump unless a minor version older than some hardcoded version (known to have fixed ruleutils) is used? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Alvaro == Alvaro Herrera alvhe...@2ndquadrant.com writes: (This of course means that if someone has a cube() function call in a group by clause of a view, then upgrading will change the meaning of the view and possibly fail to create it; there seems to be no fix for this, not even using the latest pg_dump, since pg_dump relies on the old server's ruleutils) Alvaro This sucks. Can we tweak pg_dump to check for presence of Alvaro the cube extension, and if found refuse to dump unless a Alvaro minor version older than some hardcoded version (known to Alvaro have fixed ruleutils) is used? I honestly don't think it's worth it. cube() is not a function that really makes any sense in a GROUP BY, though of course someone could have written their own function called cube() that does something else; while this case is a problem, it is also likely to be vanishingly rare. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom I wonder if you've tried hard enough to avoid reserving the keyword. GROUP BY cube(a,b) is currently legal syntax and means something completely incompatible to what the spec requires. Well, if there are any extant applications that use that exact phrasing, they're going to be broken in any case. That does not mean that we have to break every other appearance of cube. I think that special-casing appearances of cube(...) in GROUP BY lists might be a feasible approach. Not really. As pointed out downthread, you can't distinguish cube from CUBE. We could fix that with a big enough hammer, of course, but it would be a mighty big hammer. More generally, I think it makes a lot of sense to work harder to reserve keywords less when there's no fundamental semantic conflict, but when there is, trying to do things like special case stuff depending on context results in a situation where some keywords are a little more reserved than others. As you pointed out in discussions of CREATE INDEX CONCURRENTLY, that's confusing: http://www.postgresql.org/message-id/10769.1261775...@sss.pgh.pa.us (refer second paragraph) I think we should: (1) Rename the cube extension. With a bat. I have yet to encounter a single user who is using it, but there probably are some. They'll have to get over it; GROUPING SETS is roughly a hundred times more important than the cube extension. The most I'd do to cater to existing users of the extension is provide an SQL script someplace that renames the extension and all of its containing objects so that you can do that before running pg_dump/pg_upgrade. (2) Reserve CUBE to the extent necessary to implement this feature. Some people won't like this, but that's always true when we reserve a keyword, and I don't think refusing to implement an SQL-standard feature for which there is considerable demand is the right way to fix that. Here are the last five keywords we partially or fully reserved: LATERAL (fully reserved), COLLATION (type_func_name), XMLEXISTS (col_name), BETWEEN (was type_func_name, became col_name), CONCURRENTLY (type_func_name). That takes us back to December 2009, so the rate at which we do this is just under one a year, and there haven't been many screams about it. I grant you that cube is a slightly more plausible identifier than any of those, but I don't think we should let the fact that we happen to have an extension with that name prejudice us too much about how common it really is. Mind you, I'm not trying to say that we don't need to be judicious in reserving keywords, or even adding them at all: I've argued against those things on numerous occasions, and have done work to let us get rid of keywords we've previously had. I just think that this is a big enough, important enough feature that we'll please more people than we disappoint. And I think trying to walk some middle way where we distinguish on context is going to be a mess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Robert Haas robertmh...@gmail.com writes: On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, if there are any extant applications that use that exact phrasing, they're going to be broken in any case. That does not mean that we have to break every other appearance of cube. I think that special-casing appearances of cube(...) in GROUP BY lists might be a feasible approach. Not really. As pointed out downthread, you can't distinguish cube from CUBE. We could fix that with a big enough hammer, of course, but it would be a mighty big hammer. I'm not convinced of that; I think some creative hackery in the grammar might be able to deal with this. It would be a bit ugly, for sure, but if it works it would be a localized fix. Meanwhile, I don't believe that it's going to be possible to rename the cube extension in any way that's even remotely acceptable for its users (remotely acceptable here means pg_upgrade works, never mind what's going to be needed to fix their applications). So the proposal you are pushing is going to result in seriously teeing off some fraction of our userbase; and the argument why that would be acceptable seems to boil down to I think there are few enough of them that we don't have to care (an opinion based on little evidence IMO). I think it's worth investing some work, and perhaps accepting some ugly code, to try to avoid that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: So the proposal you are pushing is going to result in seriously teeing off some fraction of our userbase; and the argument why that would be acceptable seems to boil down to I think there are few enough of them that we don't have to care (an opinion based on little evidence IMO FWIW here's some evidence... Craig Kersteins did a talk on the statistics across the Heroku fleet: Here are the slides from 2013 though I think there's an updated slide deck with more recent numbers out there: https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use Cube shows up as the number 9 most popular extension with about 1% of databases having it installed (tied with pg_crypto and earthdistance). That's a lot more than I would have expected actually. Personally I would love to change the name because I always found the name the most confusing thing about it. It took me forever to figure out what on earth a cube was. It's actually a vector data type which is actually a pretty useful idea. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Fri, Aug 22, 2014 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, if there are any extant applications that use that exact phrasing, they're going to be broken in any case. That does not mean that we have to break every other appearance of cube. I think that special-casing appearances of cube(...) in GROUP BY lists might be a feasible approach. Not really. As pointed out downthread, you can't distinguish cube from CUBE. We could fix that with a big enough hammer, of course, but it would be a mighty big hammer. I'm not convinced of that; I think some creative hackery in the grammar might be able to deal with this. It would be a bit ugly, for sure, but if it works it would be a localized fix. Well, I have no idea how to do that. I think the only way you'd be able to is if you make productions like ColId and ColLabel return something different for a keyword than they do for an IDENT. And that's not going to be a localized change. If you've got another proposal, I'm all ears... Meanwhile, I don't believe that it's going to be possible to rename the cube extension in any way that's even remotely acceptable for its users (remotely acceptable here means pg_upgrade works, never mind what's going to be needed to fix their applications). So the proposal you are pushing is going to result in seriously teeing off some fraction of our userbase; and the argument why that would be acceptable seems to boil down to I think there are few enough of them that we don't have to care (an opinion based on little evidence IMO). The only hard statistics I am aware of are from Heroku. Peter Geoghegan was kind enough to find me the link: https://www.youtube.com/watch?v=MT2gzzbyWpw At around 8 minutes, he shows utilization statistics for cube at around 1% across their install base. That's higher than I would have guessed, so, eh, shows what I know. In any case, I'm not so much advocating not caring at all as confining the level of caring to what can be done without moving the earth. I think it's worth investing some work, and perhaps accepting some ugly code, to try to avoid that. I can accept ugly code, but I feel strongly that we shouldn't accept ugly semantics. Forcing cube to get out of the way may not be pretty, but I think it will be much worse if we violate the rule that quoting a keyword strips it of its special meaning; or the rule that there are four kinds of keywords and, if a keyword of a particular class is accepted as an identifier in a given context, all other keywords in that class will also be accepted as identifiers in that context. Violating those rules could have not-fun-at-all consequences like needing to pass additional context information to ruleutils.c's quote_identifier() function, or not being able to dump and restore a query from an older version even with --quote-all-identifiers. Renaming the cube type will suck for people who are using it; but it will only have to be done once; weird stuff like the above will be with us forever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On 08/22/2014 02:42 PM, Greg Stark wrote: On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: So the proposal you are pushing is going to result in seriously teeing off some fraction of our userbase; and the argument why that would be acceptable seems to boil down to I think there are few enough of them that we don't have to care (an opinion based on little evidence IMO FWIW here's some evidence... Craig Kersteins did a talk on the statistics across the Heroku fleet: Here are the slides from 2013 though I think there's an updated slide deck with more recent numbers out there: https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use Cube shows up as the number 9 most popular extension with about 1% of databases having it installed (tied with pg_crypto and earthdistance). That's a lot more than I would have expected actually. That's an interesting statistic. What I'd be more interested in is finding out how many of those are actually using it as opposed to having loaded it into a database. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Fri, Aug 22, 2014 at 1:52 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Aug 22, 2014 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: https://www.youtube.com/watch?v=MT2gzzbyWpw At around 8 minutes, he shows utilization statistics for cube at around 1% across their install base. That's higher than I would have guessed, so, eh, shows what I know. In any case, I'm not so much advocating not caring at all as confining the level of caring to what can be done without moving the earth. cube is a dependency for earthdistance and it's gotten some light advocacy throughout the years as the 'way to do it'. I tried it myself way back in the day and concluded a while back that the 'box' type + gist was better than earthdistance for bounding box operations -- it just seemed easier to understand and use. If you search the archives you'll probably find a couple of examples of me suggesting as such. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
* Andrew Dunstan (and...@dunslane.net) wrote: On 08/22/2014 02:42 PM, Greg Stark wrote: On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: So the proposal you are pushing is going to result in seriously teeing off some fraction of our userbase; and the argument why that would be acceptable seems to boil down to I think there are few enough of them that we don't have to care (an opinion based on little evidence IMO FWIW here's some evidence... Craig Kersteins did a talk on the statistics across the Heroku fleet: Here are the slides from 2013 though I think there's an updated slide deck with more recent numbers out there: https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use Cube shows up as the number 9 most popular extension with about 1% of databases having it installed (tied with pg_crypto and earthdistance). That's a lot more than I would have expected actually. That's an interesting statistic. What I'd be more interested in is finding out how many of those are actually using it as opposed to having loaded it into a database. Agreed- and how many of those have *every extension available* loaded... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Fri, Aug 22, 2014 at 10:37 PM, Stephen Frost sfr...@snowman.net wrote: Agreed- and how many of those have *every extension available* loaded... Actually that was also in the talk.a few slides later. 0.7% -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
* Greg Stark (st...@mit.edu) wrote: On Fri, Aug 22, 2014 at 10:37 PM, Stephen Frost sfr...@snowman.net wrote: Agreed- and how many of those have *every extension available* loaded... Actually that was also in the talk.a few slides later. 0.7% So, 0.3% install cube w/o installing *every* extension..? That seems like the more relevant number then, to me anyway. Admittedly, it's non-zero, but it's also a rather small percentage.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom I'm not convinced of that; I think some creative hackery in the Tom grammar might be able to deal with this. Making GROUP BY CUBE(a,b) parse as grouping sets rather than as a function turned out to be the easy part: give CUBE a lower precedence than '(' (equal to the one for IDENT and various other unreserved keywords), and a rule that has an explicit CUBE '(' gets preferred over one that reduces the CUBE to an unreserved_keyword. The (relatively minor) ugliness required is mostly in the ruleutils logic to decide how to output a cube(...) function call in such a way that it doesn't get misparsed as a grouping set. See my other mail on that. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On 08/13/2014 09:43 PM, Atri Sharma wrote: Sorry, forgot to attach the patch for fixing cube in contrib, which breaks since we now reserve cube keyword. Please find attached the same. Ugh, that will make everyone using the cube extension unhappy. After this patch, they will have to quote contrib's cube type and functions every time. I think we should bite the bullet and rename the extension, and its cube type and functions. For an application, having to suddenly quote it has the same effect as renaming it; you'll have to find all the callers and change them. And in the long-run, it's clearly better to have an unambiguous name. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: On 08/13/2014 09:43 PM, Atri Sharma wrote: Sorry, forgot to attach the patch for fixing cube in contrib, which breaks since we now reserve cube keyword. Please find attached the same. Heikki Ugh, that will make everyone using the cube extension Heikki unhappy. After this patch, they will have to quote contrib's Heikki cube type and functions every time. Heikki I think we should bite the bullet and rename the extension, I agree, the contrib/cube patch as posted is purely so we could test everything without having to argue over the new name first. (And it is posted separately from the main patch because of its length and utter boringness.) However, even if/when a new name is chosen, there's the question of how to make the upgrade path easiest. Once CUBE is reserved, up-to-date pg_dump will quote all uses of the cube type and function when dumping an older database (except inside function bodies of course), so there may be merit in keeping a cube domain over the new type, and maybe also merit in keeping the extension name. So what's the new type name going to be? cuboid? hypercube? geometric_cube? n_dimensional_box? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
A progress update: Atri We envisage that handling of arbitrary grouping sets will be Atri best done by having the planner generating an Append of Atri multiple aggregation paths, presumably with some way of moving Atri the original input path to a CTE. We have not really explored Atri yet how hard this will be; suggestions are welcome. This idea was abandoned. Instead, we have implemented full support for arbitrary grouping sets by means of a chaining system: explain (verbose, costs off) select four, ten, hundred, count(*) from onek group by cube(four,ten,hundred); QUERY PLAN - GroupAggregate Output: four, ten, hundred, count(*) Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, onek.four), (onek.hundred), () - Sort Output: four, ten, hundred Sort Key: onek.hundred, onek.four, onek.ten - ChainAggregate Output: four, ten, hundred Grouping Sets: (onek.ten, onek.hundred), (onek.ten) - Sort Output: four, ten, hundred Sort Key: onek.ten, onek.hundred - ChainAggregate Output: four, ten, hundred Grouping Sets: (onek.four, onek.ten), (onek.four) - Sort Output: four, ten, hundred Sort Key: onek.four, onek.ten - Seq Scan on public.onek Output: four, ten, hundred (20 rows) The ChainAggregate nodes use a tuplestore to communicate with the GroupAggregate node at the top of the chain; they pass through input tuples unchanged, and write aggregated result rows to the tuplestore, which the top node then returns once it has finished its own result. The organization of the planner code seems to be actively hostile to any attempt to break out new CTEs on the fly, or to plan parts of the query more than once; the method above seems to be the easiest way to avoid those issues. Atri At this point we are more interested in design review rather Atri than necessarily committing this patch in its current state. This no longer applies; we expect to post within a day or two an updated patch with full functionality. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On 08/21/2014 01:28 PM, Andrew Gierth wrote: A progress update: Atri We envisage that handling of arbitrary grouping sets will be Atri best done by having the planner generating an Append of Atri multiple aggregation paths, presumably with some way of moving Atri the original input path to a CTE. We have not really explored Atri yet how hard this will be; suggestions are welcome. This idea was abandoned. Instead, we have implemented full support for arbitrary grouping sets by means of a chaining system: explain (verbose, costs off) select four, ten, hundred, count(*) from onek group by cube(four,ten,hundred); QUERY PLAN - GroupAggregate Output: four, ten, hundred, count(*) Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, onek.four), (onek.hundred), () - Sort Output: four, ten, hundred Sort Key: onek.hundred, onek.four, onek.ten - ChainAggregate Output: four, ten, hundred Grouping Sets: (onek.ten, onek.hundred), (onek.ten) - Sort Output: four, ten, hundred Sort Key: onek.ten, onek.hundred - ChainAggregate Output: four, ten, hundred Grouping Sets: (onek.four, onek.ten), (onek.four) - Sort Output: four, ten, hundred Sort Key: onek.four, onek.ten - Seq Scan on public.onek Output: four, ten, hundred (20 rows) Uh, that's ugly. The EXPLAIN out I mean; as an implementation detail chaining the nodes might be reasonable. But the above gets unreadable if you have more than a few grouping sets. The ChainAggregate nodes use a tuplestore to communicate with the GroupAggregate node at the top of the chain; they pass through input tuples unchanged, and write aggregated result rows to the tuplestore, which the top node then returns once it has finished its own result. Hmm, so there's a magic link between the GroupAggregate at the top and all the ChainAggregates, via the tuplestore. That may be fine, we have special rules in passing information between bitmap scan nodes too. But rather than chain multiple ChainAggregate nodes, how about just doing all the work in the top GroupAggregate node? Atri At this point we are more interested in design review rather Atri than necessarily committing this patch in its current state. This no longer applies; we expect to post within a day or two an updated patch with full functionality. Ok, cool - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Andrew Gierth and...@tao11.riddles.org.uk writes: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki I think we should bite the bullet and rename the extension, I agree, the contrib/cube patch as posted is purely so we could test everything without having to argue over the new name first. I wonder if you've tried hard enough to avoid reserving the keyword. I think that the cube extension is not going to be the only casualty if cube becomes a reserved word --- that seems like a name that could be in use in lots of applications. (What do you mean, 9.5 breaks our database for tracking office space?) It would be worth quite a bit of effort to avoid that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Uh, that's ugly. The EXPLAIN out I mean; as an implementation Heikki detail chaining the nodes might be reasonable. But the above Heikki gets unreadable if you have more than a few grouping sets. It's good for highlighting performance issues in EXPLAIN, too. 4096 grouping sets takes about a third of a second to plan and execute, but something like a minute to generate the EXPLAIN output. However, for more realistic sizes, plan time is not significant and explain takes only about 40ms for 256 grouping sets. (To avoid resource exhaustion issues, we have set a limit of, currently, 4096 grouping sets per query level. Without such a limit, it is easy to write queries that would take TBs of memory to parse or plan. MSSQL and DB2 have similar limits, I'm told.) The ChainAggregate nodes use a tuplestore to communicate with the GroupAggregate node at the top of the chain; they pass through input tuples unchanged, and write aggregated result rows to the tuplestore, which the top node then returns once it has finished its own result. Heikki Hmm, so there's a magic link between the GroupAggregate at Heikki the top and all the ChainAggregates, via the tuplestore. That Heikki may be fine, we have special rules in passing information Heikki between bitmap scan nodes too. Eh. It's far from a perfect solution, but the planner doesn't lend itself to perfect solutions. Heikki But rather than chain multiple ChainAggregate nodes, how Heikki about just doing all the work in the top GroupAggregate node? It was easier this way. (How would you expect to do it all in the top node when each subset of the grouping sets list needs to see the data in a different order?) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
2014-08-21 17:00 GMT+02:00 Tom Lane t...@sss.pgh.pa.us: Andrew Gierth and...@tao11.riddles.org.uk writes: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki I think we should bite the bullet and rename the extension, I agree, the contrib/cube patch as posted is purely so we could test everything without having to argue over the new name first. I wonder if you've tried hard enough to avoid reserving the keyword. I think that the cube extension is not going to be the only casualty if cube becomes a reserved word --- that seems like a name that could be in use in lots of applications. (What do you mean, 9.5 breaks our database for tracking office space?) It would be worth quite a bit of effort to avoid that. My prototypes worked without reserved keywords if I remember well but analyzer is relatively complex Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Tom == Tom Lane t...@sss.pgh.pa.us writes: I agree, the contrib/cube patch as posted is purely so we could test everything without having to argue over the new name first. Tom I wonder if you've tried hard enough to avoid reserving the keyword. GROUP BY cube(a,b) is currently legal syntax and means something completely incompatible to what the spec requires. GROUP BY GROUPING SETS (cube(a,b), c) -- is that cube(a,b) an expression to group on, or a list of grouping sets to expand? GROUP BY (cube(a,b)) -- should that be an error, or silently treat it as a function call rather than a grouping set? What about GROUP BY GROUPING SETS ((cube(a,b)) ? (both are errors in our patch) Accepting those as valid implies a degree of possible confusion that I personally regard as quite questionable. Previous discussion seemed to have accepted that contrib/cube was going to have to be renamed. Tom I think that the cube extension is not going to be the only Tom casualty if cube becomes a reserved word --- that seems like a Tom name that could be in use in lots of applications. (What do Tom you mean, 9.5 breaks our database for tracking office space?) Tom It would be worth quite a bit of effort to avoid that. It has been a reserved word in the spec since, what, 1999? and it is a reserved word in mssql, oracle, db2, etc.? It only needs to be a col_name_keyword, so it still works as a table or column name (as usual we are less strict than the spec in that respect). I'm looking into whether it can be made unreserved, but I have serious doubts about this being a good idea. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
2014-08-21 17:58 GMT+02:00 Andrew Gierth and...@tao11.riddles.org.uk: Tom == Tom Lane t...@sss.pgh.pa.us writes: I agree, the contrib/cube patch as posted is purely so we could test everything without having to argue over the new name first. Tom I wonder if you've tried hard enough to avoid reserving the keyword. GROUP BY cube(a,b) is currently legal syntax and means something completely incompatible to what the spec requires. GROUP BY GROUPING SETS (cube(a,b), c) -- is that cube(a,b) an expression to group on, or a list of grouping sets to expand? GROUP BY (cube(a,b)) -- should that be an error, or silently treat it as a function call rather than a grouping set? What about GROUP BY GROUPING SETS ((cube(a,b)) ? (both are errors in our patch) Accepting those as valid implies a degree of possible confusion that I personally regard as quite questionable. Previous discussion seemed to have accepted that contrib/cube was going to have to be renamed. Tom I think that the cube extension is not going to be the only Tom casualty if cube becomes a reserved word --- that seems like a Tom name that could be in use in lots of applications. (What do Tom you mean, 9.5 breaks our database for tracking office space?) Tom It would be worth quite a bit of effort to avoid that. It has been a reserved word in the spec since, what, 1999? and it is a reserved word in mssql, oracle, db2, etc.? It only needs to be a col_name_keyword, so it still works as a table or column name (as usual we are less strict than the spec in that respect). I'm looking into whether it can be made unreserved, but I have serious doubts about this being a good idea. +1 contrib module should be renamed - more - current name is confusing against usual functionality related to words CUBE and ROLLUP Pavel -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom I wonder if you've tried hard enough to avoid reserving the keyword. GROUP BY cube(a,b) is currently legal syntax and means something completely incompatible to what the spec requires. Well, if there are any extant applications that use that exact phrasing, they're going to be broken in any case. That does not mean that we have to break every other appearance of cube. I think that special-casing appearances of cube(...) in GROUP BY lists might be a feasible approach. Basically, I'm afraid that unilaterally renaming cube is going to break enough applications that there will be more people who flat out don't want this patch than there will be who get benefit from it, and we end up voting to revert the feature altogether. If you'd like to take that risk then feel free to charge full steam ahead, but don't say you were not warned. And don't bother arguing that CUBE is reserved according to the standard, because that will not make one damn bit of difference to the people who will be unhappy. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Thu, Aug 21, 2014 at 1:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom I wonder if you've tried hard enough to avoid reserving the keyword. GROUP BY cube(a,b) is currently legal syntax and means something completely incompatible to what the spec requires. Well, if there are any extant applications that use that exact phrasing, they're going to be broken in any case. That does not mean that we have to break every other appearance of cube. I think that special-casing appearances of cube(...) in GROUP BY lists might be a feasible approach. Basically, I'm afraid that unilaterally renaming cube is going to break enough applications that there will be more people who flat out don't want this patch than there will be who get benefit from it, and we end up voting to revert the feature altogether. If you'd like to take that risk then feel free to charge full steam ahead, but don't say you were not warned. And don't bother arguing that CUBE is reserved according to the standard, because that will not make one damn bit of difference to the people who will be unhappy. I have to respectfully disagree. Certainly, if there is some reasonable way to not have to change 'cube' then great. But the tonnage rule applies here: even considering compatibility issues, when considering the importance of standard SQL (and, I might add, exceptionally useful) syntax and a niche extension, 'cube' is going to have to get out of the way. There are view valid reasons to break compatibility but blocking standard syntax is definitely one of them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On 08/21/2014 02:48 PM, Merlin Moncure wrote: Basically, I'm afraid that unilaterally renaming cube is going to break enough applications that there will be more people who flat out don't want this patch than there will be who get benefit from it, and we end up voting to revert the feature altogether. If you'd like to take that risk then feel free to charge full steam ahead, but don't say you were not warned. And don't bother arguing that CUBE is reserved according to the standard, because that will not make one damn bit of difference to the people who will be unhappy. I have to respectfully disagree. Certainly, if there is some reasonable way to not have to change 'cube' then great. But the tonnage rule applies here: even considering compatibility issues, when considering the importance of standard SQL (and, I might add, exceptionally useful) syntax and a niche extension, 'cube' is going to have to get out of the way. There are view valid reasons to break compatibility but blocking standard syntax is definitely one of them. I'm inclined to think that the audience for this is far larger than the audience for the cube extension, which I have not once encountered in the field. But I guess we all have different experiences. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Andrew Dunstan and...@dunslane.net writes: I'm inclined to think that the audience for this is far larger than the audience for the cube extension, which I have not once encountered in the field. Perhaps so. I would really prefer not to have to get into estimating how many people will be inconvenienced how badly. It's clear to me that not a lot of sweat has been put into seeing if we can avoid reserving the keyword, and I think we need to put in that effort. We've jumped through some pretty high hoops to avoid reserving keywords in the past, so I don't think this patch should get a free pass on the issue. Especially considering that renaming the cube extension isn't exactly going to be zero work: there is no infrastructure for such a thing. A patch consisting merely of s/cube/foobar/g isn't going to cut it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
* Tom Lane (t...@sss.pgh.pa.us) wrote: Andrew Dunstan and...@dunslane.net writes: I'm inclined to think that the audience for this is far larger than the audience for the cube extension, which I have not once encountered in the field. +1 Perhaps so. I would really prefer not to have to get into estimating how many people will be inconvenienced how badly. It's clear to me that not a lot of sweat has been put into seeing if we can avoid reserving the keyword, and I think we need to put in that effort. I'm with Merlin on this one, it's going to end up happening and I don't know that 9.5 is any worse than post-9.5 to make this change. We've jumped through some pretty high hoops to avoid reserving keywords in the past, so I don't think this patch should get a free pass on the issue. This doesn't feel like an attempt to get a free pass on anything- it's not being proposed as fully reserved and there is spec-defined syntax which needs to be supported. If we can get away with keeping it unreserved while not making it utterly confusing for users and convoluting the code, great, but that doesn't seem likely to pan out. Especially considering that renaming the cube extension isn't exactly going to be zero work: there is no infrastructure for such a thing. A patch consisting merely of s/cube/foobar/g isn't going to cut it. This is a much more interesting challenge to deal with, but perhaps we could include a perl script or pg_upgrade snippet for users to run to see if they have the extension installed and to do some magic before the actual upgrade to handle the rename..? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
On Thu, Aug 21, 2014 at 06:15:33PM -0400, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Andrew Dunstan and...@dunslane.net writes: I'm inclined to think that the audience for this is far larger than the audience for the cube extension, which I have not once encountered in the field. +1 I haven't seen it in the field either. I'd also like to mention that the mere presence of a module in our contrib/ directory can reflect bad decisions that need reversing just as easily as it can the presence of vitally important utilities that need to be preserved. I'm pretty sure the cube extension arrived after the CUBE keyword in SQL, which makes that an error on our part if true. Especially considering that renaming the cube extension isn't exactly going to be zero work: there is no infrastructure for such a thing. A patch consisting merely of s/cube/foobar/g isn't going to cut it. This is a much more interesting challenge to deal with, but perhaps we could include a perl script or pg_upgrade snippet for users to run to see if they have the extension installed and to do some magic before the actual upgrade to handle the rename..? +1 for doing this. Do we want to make some kind of generator for such things? It doesn't seem hard in principle, but I haven't tried coding it up yet. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
Stephen == Stephen Frost sfr...@snowman.net writes: I'm inclined to think that the audience for this is far larger than the audience for the cube extension, which I have not once encountered in the field. Stephen +1 Most of my encounters with cube have been me suggesting it to people on IRC as a possible approach for solving certain kinds of performance problems by converting them to N-dimensional spatial containment queries. Sometimes this works well, but it doesn't seem to be an approach that many people discover on their own. We've jumped through some pretty high hoops to avoid reserving keywords in the past, so I don't think this patch should get a free pass on the issue. Stephen This doesn't feel like an attempt to get a free pass on Stephen anything- it's not being proposed as fully reserved and Stephen there is spec-defined syntax which needs to be supported. Stephen If we can get away with keeping it unreserved while not Stephen making it utterly confusing for users and convoluting the Stephen code, great, but that doesn't seem likely to pan out. Having now spent some more time looking, I believe there is a solution which makes it unreserved which does not require any significant pain in the code. I'm not entirely convinced that this is the right approach in the long term, but it might allow for a more planned transition. The absolute minimum seems to be: GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select list as a grouping operation looks like a function call for any argument types) CUBE, ROLLUP, SETS as unreserved_keyword -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers