Re: [HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Mon, Jul 15, 2013 at 11:43:04AM -0700, David Fetter wrote: On Sun, Jul 14, 2013 at 10:15:12PM -0400, Noah Misch wrote: See attached patch revisions. The first patch edits find_minmax_aggs_walker() per my comments just now. The second is an update of your FILTER patch with the changes to which I alluded above; it applies atop the first patch. Would you verify that I didn't ruin anything? Barring problems, will commit. Tested your changes. They pass regression, etc. :) Committed. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Sun, Jul 14, 2013 at 10:15:12PM -0400, Noah Misch wrote: On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote: On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote: Overall I think this patch offers useful additional functionality, in compliance with the SQL spec, which should be handy to simplify complex grouping queries. As I understand this feature, it is syntactic sugar for the typical case of an aggregate with a strict transition function. For example, min(x) FILTER (WHERE y 0) is rigorously equivalent to min(CASE y 0 THEN x END). Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard queries it is *only* syntactic sugar. In PostgreSQL, it lets you do novel things with, for example, array_agg(). Is that accurate? I think this is ready for committer. The patch was thorough. I updated applicable comments, revisited some cosmetic choices, and made these functional changes: 1. The pg_stat_statements query jumble should incorporate the filter. 2. The patch did not update costing. I made it add the cost of the filter expression the same way we add the cost of the argument expressions. This makes min(x) FILTER (WHERE y 0) match min(case y 0 THEN x end) in terms of cost, which is a fair start. At some point, we could do better by reducing the argument cost by the filter selectivity. A few choices/standard interpretations may deserve discussion. The patch makes filter clauses contribute to the subquery level chosen to be the aggregation query. This is observable through the behavior of these two standard-conforming queries: select (select count(outer_c) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- outer query is aggregation query select (select count(outer_c) filter (where inner_c 0) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- inner query is aggregation query I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this. Since that still isn't crystal-clear to me, I mention it in case anyone has a different reading. Distinct from that, albeit in a similar vein, SQL does not permit outer references in a filter clause. This patch permits them; I think that qualifies as a reasonable PostgreSQL extension. --- a/doc/src/sgml/keywords.sgml +++ b/doc/src/sgml/keywords.sgml @@ -3200,7 +3200,7 @@ /row row entrytokenOVER/token/entry -entryreserved (can be function or type)/entry +entrynon-reserved/entry I committed this one-line correction separately. --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context) ListCell *l; Assert(aggref-agglevelsup == 0); - if (list_length(aggref-args) != 1 || aggref-aggorder != NIL) + if (list_length(aggref-args) != 1 || aggref-aggorder != NIL || aggref-agg_filter != NULL) return true;/* it couldn't be MIN/MAX */ /* note: we do not care if DISTINCT is mentioned ... */ I twitched upon reading this, because neither ORDER BY nor FILTER preclude the aggregate being MIN or MAX. Perhaps Andrew can explain why he put aggorder there back in 2009. All I can figure is that writing max(c ORDER BY x) is so unlikely that we'd too often waste the next syscache lookup. But the same argument would apply to DISTINCT. With FILTER, the rationale is entirely different. The aggregate could well be MIN/MAX; we just haven't implemented the necessary support elsewhere in this file. See attached patch revisions. The first patch edits find_minmax_aggs_walker() per my comments just now. The second is an update of your FILTER patch with the changes to which I alluded above; it applies atop the first patch. Would you verify that I didn't ruin anything? Barring problems, will commit. Are you the sole named author of this patch? That's what the CF page says, but that wasn't fully clear to me from the list discussions. Thanks, nm Tested your changes. They pass regression, etc. :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Mon, Jul 15, 2013 at 06:56:17PM +, Andrew Gierth wrote: Noah Misch said: I twitched upon reading this, because neither ORDER BY nor FILTER preclude the aggregate being MIN or MAX. Perhaps Andrew can explain why he put aggorder there back in 2009. The bottom line is that I intentionally avoided assuming that an agg with an aggsortop could only be min() or max() and that having an order by clause would always be harmless in such cases. I can't think of an actual use case where it would matter, but I've seen people define some pretty strange aggs recently. So I mildly object to simply throwing away the ORDER BY clause in such cases. I can't think of another use for aggsortop as defined today. However, on further reflection, min(x ORDER BY y) is not identical to min(x) when the B-tree operator class of the aggsortop can find non-identical datums to be equal. This affects at least min(numeric) and min(interval). min(x) chooses an unspecified x among those equal to the smallest x, while min(x ORDER BY y) can be used to narrow the choice. I will update the comments along those lines and not change semantics after all. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote: On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote: Overall I think this patch offers useful additional functionality, in compliance with the SQL spec, which should be handy to simplify complex grouping queries. As I understand this feature, it is syntactic sugar for the typical case of an aggregate with a strict transition function. For example, min(x) FILTER (WHERE y 0) is rigorously equivalent to min(CASE y 0 THEN x END). Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard queries it is *only* syntactic sugar. In PostgreSQL, it lets you do novel things with, for example, array_agg(). Is that accurate? I think this is ready for committer. The patch was thorough. I updated applicable comments, revisited some cosmetic choices, and made these functional changes: 1. The pg_stat_statements query jumble should incorporate the filter. 2. The patch did not update costing. I made it add the cost of the filter expression the same way we add the cost of the argument expressions. This makes min(x) FILTER (WHERE y 0) match min(case y 0 THEN x end) in terms of cost, which is a fair start. At some point, we could do better by reducing the argument cost by the filter selectivity. A few choices/standard interpretations may deserve discussion. The patch makes filter clauses contribute to the subquery level chosen to be the aggregation query. This is observable through the behavior of these two standard-conforming queries: select (select count(outer_c) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- outer query is aggregation query select (select count(outer_c) filter (where inner_c 0) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- inner query is aggregation query I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this. Since that still isn't crystal-clear to me, I mention it in case anyone has a different reading. Distinct from that, albeit in a similar vein, SQL does not permit outer references in a filter clause. This patch permits them; I think that qualifies as a reasonable PostgreSQL extension. --- a/doc/src/sgml/keywords.sgml +++ b/doc/src/sgml/keywords.sgml @@ -3200,7 +3200,7 @@ /row row entrytokenOVER/token/entry -entryreserved (can be function or type)/entry +entrynon-reserved/entry I committed this one-line correction separately. --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context) ListCell *l; Assert(aggref-agglevelsup == 0); - if (list_length(aggref-args) != 1 || aggref-aggorder != NIL) + if (list_length(aggref-args) != 1 || aggref-aggorder != NIL || aggref-agg_filter != NULL) return true;/* it couldn't be MIN/MAX */ /* note: we do not care if DISTINCT is mentioned ... */ I twitched upon reading this, because neither ORDER BY nor FILTER preclude the aggregate being MIN or MAX. Perhaps Andrew can explain why he put aggorder there back in 2009. All I can figure is that writing max(c ORDER BY x) is so unlikely that we'd too often waste the next syscache lookup. But the same argument would apply to DISTINCT. With FILTER, the rationale is entirely different. The aggregate could well be MIN/MAX; we just haven't implemented the necessary support elsewhere in this file. See attached patch revisions. The first patch edits find_minmax_aggs_walker() per my comments just now. The second is an update of your FILTER patch with the changes to which I alluded above; it applies atop the first patch. Would you verify that I didn't ruin anything? Barring problems, will commit. Are you the sole named author of this patch? That's what the CF page says, but that wasn't fully clear to me from the list discussions. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com *** a/src/backend/optimizer/plan/planagg.c --- b/src/backend/optimizer/plan/planagg.c *** *** 314,328 find_minmax_aggs_walker(Node *node, List **context) ListCell *l; Assert(aggref-agglevelsup == 0); ! if (list_length(aggref-args) != 1 || aggref-aggorder != NIL) return true;/* it couldn't be MIN/MAX */ ! /* note: we do not care if DISTINCT is mentioned ... */ ! curTarget = (TargetEntry *) linitial(aggref-args); aggsortop = fetch_agg_sort_op(aggref-aggfnoid); if (!OidIsValid(aggsortop)) return true;/* not a MIN/MAX aggregate */ if (contain_mutable_functions((Node *) curTarget-expr)) return true;/* not potentially
[HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Sun, Jul 14, 2013 at 10:15:12PM -0400, Noah Misch wrote: On Sun, Jul 07, 2013 at 06:37:26PM -0700, David Fetter wrote: On Sat, Jul 06, 2013 at 11:49:21AM +0100, Dean Rasheed wrote: Overall I think this patch offers useful additional functionality, in compliance with the SQL spec, which should be handy to simplify complex grouping queries. As I understand this feature, it is syntactic sugar for the typical case of an aggregate with a strict transition function. For example, min(x) FILTER (WHERE y 0) is rigorously equivalent to min(CASE y 0 THEN x END). Every SQL aggregate is strict (ISO 9075-2 section 4.15.4), so for standard queries it is *only* syntactic sugar. In PostgreSQL, it lets you do novel things with, for example, array_agg(). Is that accurate? I think this is ready for committer. The patch was thorough. I updated applicable comments, revisited some cosmetic choices, and made these functional changes: 1. The pg_stat_statements query jumble should incorporate the filter. 2. The patch did not update costing. I made it add the cost of the filter expression the same way we add the cost of the argument expressions. This makes min(x) FILTER (WHERE y 0) match min(case y 0 THEN x end) in terms of cost, which is a fair start. At some point, we could do better by reducing the argument cost by the filter selectivity. Thanks! A few choices/standard interpretations may deserve discussion. The patch makes filter clauses contribute to the subquery level chosen to be the aggregation query. This is observable through the behavior of these two standard-conforming queries: select (select count(outer_c) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- outer query is aggregation query select (select count(outer_c) filter (where inner_c 0) from (values (1)) t0(inner_c)) from (values (2),(3)) t1(outer_c); -- inner query is aggregation query I believe SQL (ISO 9075-2 section 6.9 SR 3,4,6) does require this. Since that still isn't crystal-clear to me, I mention it in case anyone has a different reading. Distinct from that, albeit in a similar vein, SQL does not permit outer references in a filter clause. This patch permits them; I think that qualifies as a reasonable PostgreSQL extension. Thanks again. --- a/doc/src/sgml/keywords.sgml +++ b/doc/src/sgml/keywords.sgml @@ -3200,7 +3200,7 @@ /row row entrytokenOVER/token/entry -entryreserved (can be function or type)/entry +entrynon-reserved/entry I committed this one-line correction separately. --- a/src/backend/optimizer/plan/planagg.c +++ b/src/backend/optimizer/plan/planagg.c @@ -314,7 +314,7 @@ find_minmax_aggs_walker(Node *node, List **context) ListCell *l; Assert(aggref-agglevelsup == 0); - if (list_length(aggref-args) != 1 || aggref-aggorder != NIL) + if (list_length(aggref-args) != 1 || aggref-aggorder != NIL || aggref-agg_filter != NULL) return true;/* it couldn't be MIN/MAX */ /* note: we do not care if DISTINCT is mentioned ... */ I twitched upon reading this, because neither ORDER BY nor FILTER preclude the aggregate being MIN or MAX. Perhaps Andrew can explain why he put aggorder there back in 2009. All I can figure is that writing max(c ORDER BY x) is so unlikely that we'd too often waste the next syscache lookup. But the same argument would apply to DISTINCT. With FILTER, the rationale is entirely different. The aggregate could well be MIN/MAX; we just haven't implemented the necessary support elsewhere in this file. Excellent reasoning, and good catch. See attached patch revisions. The first patch edits find_minmax_aggs_walker() per my comments just now. The second is an update of your FILTER patch with the changes to which I alluded above; it applies atop the first patch. Would you verify that I didn't ruin anything? Barring problems, will commit. Are you the sole named author of this patch? That's what the CF page says, but that wasn't fully clear to me from the list discussions. While Andrew's help was invaluable and pervasive, I wrote the patch, and all flaws in it are my fault. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Thu, Jun 27, 2013 at 08:41:59AM +, Andrew Gierth wrote: Tom Lane said: Agreed, separating out the function-call-with-trailing-declaration syntaxes so they aren't considered in FROM and index_elem seems like the best compromise. If we do that for window function OVER clauses as well, can we make OVER less reserved? Yes. At least, I tried it with both OVER and FILTER unreserved and there were no grammar conflicts (and I didn't have to do anything fancy to avoid them), and it passed regression with the exception of the changed error message for window functions in the from-clause. So is this the final decision on how to proceed? It seems good to me, and I can work with David to get it done. If this is really the direction people want to go, I'm in. Is there some code I can look at? I still submit that having our reserved word ducks in a row in advance is a saner way to go about this, and will work up a patch for that as I have time. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Mon, Jun 24, 2013 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Or maybe they really don't give a damn about breaking applications every time they invent a new reserved word? I think this is the obvious conclusion. In the standard the reserved words are pretty explicitly reserved and not legal column names, no? I think their model is that applications work with a certain version of SQL and they're not expected to work with a new version without extensive updating. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Greg Stark st...@mit.edu writes: I think their model is that applications work with a certain version of SQL and they're not expected to work with a new version without extensive updating. Hm. We could invent a sql_version parameter and tweak the lexer to return keywords added in spec versions later than that as IDENT. However, I fear such a parameter would be a major PITA from the user's standpoint, just like most of our backwards-compatibility GUCs have proven to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers