Further questions about WITHIN GROUP: I believe that the spec requires that the "direct" arguments of an inverse or hypothetical-set aggregate must not contain any Vars of the current query level. They don't manage to say that in plain English, of course, but in the <hypothetical set function> case the behavior is defined in terms of this sub-select:
( SELECT 0, SK1, ..., SKK FROM TNAME UNION ALL VALUES (1, VE1, ..., VEK) ) where SKn are the sort key values, TNAME is an alias for the input table, and VEn are the direct arguments. Any input-table Vars the aggregate might refer to are thus in scope only for the SKn not the VEn. (This definition also makes it clear that the VEn are to be evaluated once, not once per row.) In the <inverse distribution function> case, they resort to claiming that the value of the <inverse distribution function argument> can be replaced by a numeric literal, which again makes it clear that it's supposed to be evaluated just once. So that's all fine, but the patch seems a bit confused about it. If you try to cheat, you get an error message that's less than apropos: regression=# select cume_dist(f1) within group(order by f1) from text_tbl ; ERROR: column "text_tbl.f1" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl... ^ I'd have hoped for a message along the line of "fixed arguments of cume_dist() must not contain variables", similar to the phrasing we use when you try to put a variable in LIMIT. Also, there are some comments and code changes in nodeAgg.c that seem to be on the wrong page here: + /* + * Use the representative input tuple for any references to + * non-aggregated input columns in the qual and tlist. (If we are not + * grouping, and there are no input rows at all, we will come here + * with an empty firstSlot ... but if not grouping, there can't be any + * references to non-aggregated input columns, so no problem.) + * We do this before finalizing because for ordered set functions, + * finalize_aggregates can evaluate arguments referencing the tuple. + */ + econtext->ecxt_outertuple = firstSlot; + The last two lines of that comment are new, all the rest was moved from someplace else. Those last two lines are wrong according to the above reasoning, and if they were correct the argument made in the pre-existing part of the comment would be all wet, meaning the code could in fact crash when trying to evaluate a Var reference in finalize_aggregates. So I'm inclined to undo this part of the patch (and anything else that's rearranging logic with an eye to Var references in finalize_aggregates), and to try to fix parse_agg.c so it gives a more specific error message for this case. After looking at this I'm a bit less enthused about the notion of hybrid aggregates than I was before. I now see that the spec intends that when the WITHIN GROUP notation is used, *all* the arguments to the left of WITHIN GROUP are meant to be fixed evaluate-once arguments. While we could possibly define aggregates for which some of those argument positions are treated as evaluate-once-per-row arguments, I'm realizing that that'd likely be pretty darn confusing for users. What I now think we should do about the added pg_aggregate columns is to have: aggnfixedargs int number of fixed arguments, excluding any hypothetical ones (always 0 for normal aggregates) aggkind char 'n' for normal aggregate, 'o' for ordered set function, 'h' for hypothetical-set function with the parsing rules that given agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort specifications ) 1. WITHIN GROUP is disallowed for normal aggregates. 2. For an ordered set function, n must equal aggnfixedargs. We treat all n fixed arguments as contributing to the aggregate's result collation, but ignore the sort arguments. 3. For a hypothetical-set function, n must equal aggnfixedargs plus k, and we match up type and collation info of the last k fixed arguments with the corresponding sort arguments. The first n-k fixed arguments contribute to the aggregate's result collation, the rest not. Reading back over this email, I see I've gone back and forth between using the terms "direct args" and "fixed args" for the evaluate-once stuff to the left of WITHIN GROUP. I guess I'm not really sold on either terminology, but we need something we can use consistently in the code and docs. The spec is no help, it has no generic term at all for these args. Does anybody else have a preference, or maybe another suggestion entirely? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers