Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Tom Lane Wrote: Well, this certainly demonstrates that the check I added to parseCheckAggregates is wrongly placed, but I'm not sure we really need to forbid the case. David's example query seems to give sane answers once the bug in begin_partition is fixed: parentpart | childpart | quantity | rn +---+--+ KITCHEN| TABLE |1 | 1 KITCHEN| COOKER|1 | 2 KITCHEN| FRIDGE|1 | 3 TABLE | CHAIR |4 | 1 CHAIR | LEG |4 | 1 (5 rows) For what it's worth I've been looking into how DB2 and Sybase handle this. DB2 seems to disallow any functions in the SELECT list of the recursive part of the query. Error message is a little long winded to show here. It's also very generic and also covers GROUP Bys and HAVINGs saying that they're also not allowed. However, Sybase does allow this query. I did modify the window's ORDER BY as previously the order was undefined. The results match PostgreSQL. Also while testing I noticed that this query didn't error out when it should have: (Of course I only noticed because Sybase did) WITH RECURSIVE bom(parentpart,childpart,quantity,rn) AS ( SELECT parentpart,childpart,quantity,ROW_NUMBER() OVER (ORDER BY parentpart,childpart) FROM billofmaterials WHERE parentpart = 'KITCHEN' UNION ALL SELECT b.parentpart,b.childpart,b.quantity,ROW_NUMBER() OVER (ORDER BY parentpart,childpart) FROM billofmaterials b,bom WHERE b.parentpart = bom.childpart ) SELECT * FROM bom; Notice the ORDER BY in the recursive part of the query orders by an ambiguous column without complaint. If I replace b.quantity with just quantity it does error there. So seems to just not be picking up the problem in the window clause. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
David Rowley dgrow...@gmail.com writes: Also while testing I noticed that this query didn't error out when it should have: (Of course I only noticed because Sybase did) WITH RECURSIVE bom(parentpart,childpart,quantity,rn) AS ( SELECT parentpart,childpart,quantity,ROW_NUMBER() OVER (ORDER BY parentpart,childpart) FROM billofmaterials WHERE parentpart = 'KITCHEN' UNION ALL SELECT b.parentpart,b.childpart,b.quantity,ROW_NUMBER() OVER (ORDER BY parentpart,childpart) FROM billofmaterials b,bom WHERE b.parentpart = bom.childpart ) SELECT * FROM bom; Notice the ORDER BY in the recursive part of the query orders by an ambiguous column without complaint. Actually, it's not ambiguous according to our interpretation of ORDER BY clauses: the first attempt is to match an output column name, so it's seizing on the first SELECT column (b.parentpart) as being the intended sort key for parentpart, and similarly for childpart. You'd get the same thing if you did ORDER BY 1,2. We could disable all those special rules for window cases, but then we'd have to document how window ORDER BY is different from query ORDER BY, etc. I think it'd be more confusing not less so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Tom Lane Wrote: Actually, it's not ambiguous according to our interpretation of ORDER BY clauses: the first attempt is to match an output column name, so it's seizing on the first SELECT column (b.parentpart) as being the intended sort key for parentpart, and similarly for childpart. You'd get the same thing if you did ORDER BY 1,2. Aha, I see. Well I learned something new tonight. I didn't realise that Postgres would be that intelligent about it. It makes total sense. I probably should have known this, but I'll forgive myself as I'm one of these people that will prefix all column names when joining in case I ever add new conflicting columns. end of excuse We could disable all those special rules for window cases, but then we'd have to document how window ORDER BY is different from query ORDER BY, etc. I think it'd be more confusing not less so. Without any further thought, I vote to leave it how it is. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Tom Lane Wrote: I've spent quite a bit of time reviewing the window functions patch, and I think it is now ready to commit, other than the documentation (which I've not looked at yet at all). Attached is my current patch against HEAD, sans documentation. This incorporates the recently discussed aggregate-function API changes and support for tuplestore trimming. There's a number of things that could be improved yet: * we really ought to have some support for non-built-in window functions * I think the planner could be a bit smarter about when to sort or not * tuplestore_advance and related code really needs to be made more efficient; it didn't matter much before but it does now but I think these things can be worked on after the core patch is committed. regards, tom lane I've started running my test queries that I used when reviewing the patch. The following crashes the backend: CREATE TABLE billofmaterials ( parentpart VARCHAR(20) NOT NULL, childpart VARCHAR(20) NOT NULL, quantity FLOAT NOT NULL, CHECK(quantity 0), PRIMARY KEY(parentpart, childpart) ); INSERT INTO billofmaterials VALUES('KITCHEN','TABLE',1); INSERT INTO billofmaterials VALUES('KITCHEN','COOKER',1); INSERT INTO billofmaterials VALUES('KITCHEN','FRIDGE',1); INSERT INTO billofmaterials VALUES('TABLE','CHAIR',4); INSERT INTO billofmaterials VALUES('CHAIR','LEG',4); WITH RECURSIVE bom AS ( SELECT parentpart,childpart,quantity,ROW_NUMBER() OVER (ORDER BY parentpart DESC) rn FROM billofmaterials WHERE parentpart = 'KITCHEN' UNION ALL SELECT b.parentpart,b.childpart,b.quantity,ROW_NUMBER() OVER (ORDER BY parentpart ASC) rn FROM billofmaterials b INNER JOIN bom ON b.parentpart = bom.childpart ) SELECT * from bom; It seems not to like recursively calling row_number(). It does not crash if I replace the 2nd row_number() with the constant 1 I compared everything to Oracle again and found no differences in results. These tests test all window functions in some way or another. I compared all results to Oracle 10g results apart from the queries that have NTH_VALUE as this is not implemented by Oracle 10g. Also seems like NTH_VALUE is not implemented by DB2 9.5 either. Anyone know of any database that does have NTH_VALUE? David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/28 David Rowley dgrow...@gmail.com: Tom Lane Wrote: I've spent quite a bit of time reviewing the window functions patch, and I think it is now ready to commit, other than the documentation (which I've not looked at yet at all). Attached is my current patch against HEAD, sans documentation. This incorporates the recently discussed aggregate-function API changes and support for tuplestore trimming. There's a number of things that could be improved yet: * we really ought to have some support for non-built-in window functions * I think the planner could be a bit smarter about when to sort or not * tuplestore_advance and related code really needs to be made more efficient; it didn't matter much before but it does now but I think these things can be worked on after the core patch is committed. regards, tom lane I've started running my test queries that I used when reviewing the patch. The following crashes the backend: CREATE TABLE billofmaterials ( parentpart VARCHAR(20) NOT NULL, childpart VARCHAR(20) NOT NULL, quantity FLOAT NOT NULL, CHECK(quantity 0), PRIMARY KEY(parentpart, childpart) ); INSERT INTO billofmaterials VALUES('KITCHEN','TABLE',1); INSERT INTO billofmaterials VALUES('KITCHEN','COOKER',1); INSERT INTO billofmaterials VALUES('KITCHEN','FRIDGE',1); INSERT INTO billofmaterials VALUES('TABLE','CHAIR',4); INSERT INTO billofmaterials VALUES('CHAIR','LEG',4); WITH RECURSIVE bom AS ( SELECT parentpart,childpart,quantity,ROW_NUMBER() OVER (ORDER BY parentpart DESC) rn FROM billofmaterials WHERE parentpart = 'KITCHEN' UNION ALL SELECT b.parentpart,b.childpart,b.quantity,ROW_NUMBER() OVER (ORDER BY parentpart ASC) rn FROM billofmaterials b INNER JOIN bom ON b.parentpart = bom.childpart ) SELECT * from bom; It seems not to like recursively calling row_number(). It does not crash if I replace the 2nd row_number() with the constant 1 It seems that parseCheckWindowFuncs() doesn't check CTE case whereas parseCheckAggregates() checks it, including check of window functions. So the urgent patch is as following, but is this operation allowed? I am blind in CTE rules... *** src/backend/parser/parse_agg.c.orig Sun Dec 28 15:34:21 2008 --- src/backend/parser/parse_agg.c Mon Dec 29 01:13:16 2008 *** *** 336,347 errmsg(aggregate functions not allowed in a recursive query's recursive term), parser_errposition(pstate, locate_agg_of_level((Node *) qry, 0; - if (pstate-p_hasWindowFuncs hasSelfRefRTEs) - ereport(ERROR, - (errcode(ERRCODE_INVALID_RECURSION), -errmsg(window functions not allowed in a recursive query's recursive term), -parser_errposition(pstate, - locate_windowfunc((Node *) qry; } /* --- 336,341 *** *** 357,366 --- 351,374 parseCheckWindowFuncs(ParseState *pstate, Query *qry) { ListCell *l; + boolhasSelfRefRTEs; /* This should only be called if we found window functions */ Assert(pstate-p_hasWindowFuncs); + /* +* Scan the range table to see if there are JOIN or self-reference CTE +* entries. We'll need this info below. +*/ + hasSelfRefRTEs = false; + foreach(l, pstate-p_rtable) + { + RangeTblEntry *rte = (RangeTblEntry *) lfirst(l); + + if (rte-rtekind != RTE_JOIN rte-rtekind == RTE_CTE rte-self_reference) + hasSelfRefRTEs = true; + } + if (checkExprHasWindowFuncs(qry-jointree-quals)) ereport(ERROR, (errcode(ERRCODE_WINDOWING_ERROR), *** *** 426,431 --- 434,446 locate_windowfunc(expr; } } + + if (pstate-p_hasWindowFuncs hasSelfRefRTEs) + ereport(ERROR, + (errcode(ERRCODE_INVALID_RECURSION), +errmsg(window functions not allowed in a recursive query's recursive term), +parser_errposition(pstate, + locate_windowfunc((Node *) qry; } /* Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
David Rowley dgrow...@gmail.com writes: I've started running my test queries that I used when reviewing the patch. The following crashes the backend: Fixed, thanks. (begin_partition was failing to reset spooled_rows when falling out early because of empty outerplan; which could only cause an issue if the outerplan changed from nonempty to empty during a ReScan. This was true before, not sure why it failed to crash for you before...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada umi.tan...@gmail.com writes: I ran the patch witouht any errors. Though it's trivial, I noticed window_gettupleslot has to be fixed a bit. Yeah, it could uselessly spool the partition before failing. I think it had been that way before and I left it alone, but changing it is good --- diff included in my patch. I'm still working on the docs but expect to commit later today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: WITH RECURSIVE bom AS ( SELECT parentpart,childpart,quantity,ROW_NUMBER() OVER (ORDER BY parentpart DESC) rn FROM billofmaterials WHERE parentpart = 'KITCHEN' UNION ALL SELECT b.parentpart,b.childpart,b.quantity,ROW_NUMBER() OVER (ORDER BY parentpart ASC) rn FROM billofmaterials b INNER JOIN bom ON b.parentpart = bom.childpart ) SELECT * from bom; It seems not to like recursively calling row_number(). It does not crash if I replace the 2nd row_number() with the constant 1 It seems that parseCheckWindowFuncs() doesn't check CTE case whereas parseCheckAggregates() checks it, including check of window functions. So the urgent patch is as following, but is this operation allowed? I am blind in CTE rules... I should have said that this is the first time I've seen a crash running this query. I only ever ran this query to verify that the backend didn't crash. I didn't ever pay much attention to the results. Do you have an older version that you can verify if it worked as it should have or not? Your final version won't patch with CVS head anymore. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada umi.tan...@gmail.com writes: 2008/12/28 David Rowley dgrow...@gmail.com: I've started running my test queries that I used when reviewing the patch. The following crashes the backend: It seems that parseCheckWindowFuncs() doesn't check CTE case whereas parseCheckAggregates() checks it, including check of window functions. So the urgent patch is as following, but is this operation allowed? I am blind in CTE rules... Well, this certainly demonstrates that the check I added to parseCheckAggregates is wrongly placed, but I'm not sure we really need to forbid the case. David's example query seems to give sane answers once the bug in begin_partition is fixed: parentpart | childpart | quantity | rn +---+--+ KITCHEN| TABLE |1 | 1 KITCHEN| COOKER|1 | 2 KITCHEN| FRIDGE|1 | 3 TABLE | CHAIR |4 | 1 CHAIR | LEG |4 | 1 (5 rows) I basically went around and made sure everyplace that threw an error for aggregates also threw one for window functions, but it's quite possible that that's overly restrictive in some places. Window functions don't cause grouping so there's no reason to forbid them in places where it's the grouping behavior that is objectionable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
... and it's committed. Congratulations! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/29 Tom Lane t...@sss.pgh.pa.us: ... and it's committed. Congratulations! regards, tom lane Great! I am really glad I can contribute PostgreSQL project by such a big improvement. And I really thank all the hackers, without all the helps by you it wouldn't be done, obviously. Thank you. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/28 Tom Lane t...@sss.pgh.pa.us: I've spent quite a bit of time reviewing the window functions patch, and I think it is now ready to commit, other than the documentation (which I've not looked at yet at all). Attached is my current patch against HEAD, sans documentation. This incorporates the recently discussed aggregate-function API changes and support for tuplestore trimming. There's a number of things that could be improved yet: * we really ought to have some support for non-built-in window functions * I think the planner could be a bit smarter about when to sort or not * tuplestore_advance and related code really needs to be made more efficient; it didn't matter much before but it does now but I think these things can be worked on after the core patch is committed. I ran the patch witouht any errors. Though it's trivial, I noticed window_gettupleslot has to be fixed a bit. diff src/backend/executor/nodeWindowAgg.c.orig src/backend/executor/nodeWindowAgg.c 1445a1446,1449 /* pos = -1 means special on spool_tuples(), so check it before */ if (pos 0) return false; 1449c1453 if (pos = winstate-spooled_rows || pos 0) --- if (pos = winstate-spooled_rows) Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada umi.tan...@gmail.com writes: 2008/12/20 Tom Lane t...@sss.pgh.pa.us: I've been studying the grammar for the windowing patch a bit. It seems to me that the existing window name option for window specification got left out. I completely missed this issue. If the existing window name is allowed in window clause, then does it mean this is possible? SELECT row_number() OVER w2 FROM t WINDOW w1 AS (PARTITION BY grp), w2(w1) Yes, I read the spec to allow that. It's a bit pointless unless w2 modifies w1 somehow, though. And what if w1 refers to w2 and w2 refers to w1 cyclicly? Not allowed --- the scope of a window name doesn't include previous elements of the WINDOW list, so a forward reference isn't valid. And from what I read the spec, it seems to me that it effects only frame clause which is unlikely implemented for 8.4, because if existing window name) is specified then partition clause and order by clause are both permitted in the window definition. No, you're allowed to substitute a new order by clause. The useful case seems to be that you have a base definition giving a partitioning and then several derived windows with different orderings and/or different frame clauses. I'm not really sure why they bothered to set up the syntax like that, but the spec is pretty clear about it... I am also fairly strongly inclined to rip out all of the frame_clause syntax, since (1) it's unimplemented and unlikely to get implemented for 8.4, and (2) it's not particularly satisfactory anyway. The reason I added those grammer was 1) there had been possibilities to implement frame clause, and 2) to support not-implemented error. 1) is unlikey to be done and for 2) we need only rows/range syntax. So if it is annoying for 8.4 release, I don't disagree with your suggestion. I've been hacking on this and I have a grammar that pretty much works, but there's some bizarreness around UNBOUNDED. I'll post it later. There's still a a question whether we should introduce a lot of new keywords and productions now to support a feature we don't intend to implement in 8.4. There's a distributed speed cost for each keyword and each production; I don't really see the argument why users should pay even a small cost for zero benefit in this release. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
I wrote: I've been hacking on this and I have a grammar that pretty much works, but there's some bizarreness around UNBOUNDED. I'll post it later. Here is a proof-of-concept grammar patch that allows frame_bound to use a_expr instead of a hacked-up constant production (which, as I complained before, didn't allow all the cases demanded by the spec). The key changes involved are: * BETWEEN has to become a fully reserved word instead of type_func_name. * PARTITION RANGE ROWS FOLLOWING PRECEDING have to be assigned the same explicit precedence as IDENT. (This doesn't have any bad consequences AFAIK, except that you can't use a postfix operator in frame_bound exprs unless you parenthesize it.) * UNBOUNDED has to be assigned a precedence slightly lower than PRECEDING and FOLLOWING, else it's ambiguous whether UNBOUNDED PRECEDING ought to be parsed as a_expr PRECEDING. I'm a bit nervous about this solution; we might someday need to make some of these keywords at least partly reserved instead. But right now it doesn't seem to have any negative consequences. Making BETWEEN fully reserved is a bit annoying, but it would only break apps using BETWEEN as a type or function name. The former doesn't seem like a problem, the latter might be. Comments? regards, tom lane PS: I've removed some uninteresting hunks to shorten the diff, so you might get some chatter from patch if you try to apply the diff. *** src/backend/parser/gram.y Sat Dec 20 11:02:55 2008 --- ./gram.ySun Dec 21 14:56:17 2008 *** *** 158,163 --- 158,164 DefElem *defelt; OptionDefElem *optdef; SortBy *sortby; + WindowDef *windef; JoinExpr*jexpr; IndexElem *ielem; Alias *alias; *** *** 402,407 --- 403,414 %type with with_clause %type list cte_list + %type list window_clause window_definition_list opt_partition_clause + %type windefwindow_definition over_clause window_specification + %type str opt_existing_window_name + %type node opt_frame_clause frame_clause frame_extent + %type node frame_bound opt_frame_exclusion + /* * If you make any token changes, update the keyword table in *** *** 431,440 DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP ! EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT EXCLUDING ! EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT ! FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOR FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION GLOBAL GRANT GRANTED GREATEST GROUP_P --- 438,447 DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP ! EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT EXCLUDE ! EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT ! FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOLLOWING FOR FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION GLOBAL GRANT GRANTED GREATEST GROUP_P *** *** 461,475 NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF NULLS_P NUMERIC OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR ! ORDER OUT_P OUTER_P OVERLAPS OVERLAY OWNED OWNER ! PARSER PARTIAL PASSWORD PLACING PLANS POSITION ! PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE QUOTE ! READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE --- 468,482 NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF NULLS_P NUMERIC OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR ! ORDER OTHERS OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER ! PARSER PARTIAL PARTITION PASSWORD PLACING PLANS POSITION ! PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE QUOTE ! RANGE READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE *** *** 479,495 STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING SUPERUSER_P SYMMETRIC SYSID SYSTEM_P ! TABLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P TRUNCATE TRUSTED TYPE_P !
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/20 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: Here's the patch, including latest function default args. Regards, The comments added to planner.c contain various claims that setrefs.c will fix up window function references, but I see no code related to that in setrefs.c. Please explain. The Window node handles only expressions that contain corresponding window functions. Since more than one Window nodes may be put in a Plan, the window functions in the upper node are not evaluated in the lower node but tlist should not forget it exists in the lower. For examples, SQL: SELECT row_number() OVER (ORDER BY salary) AS r1, row_number() OVER (ORDER BY depname) AS r2 FROM empsalary; The final Plan: -Window (upper) pull r1 (Var) eval r2 (WindowFuncExpr) -Sort by depname -Window (lower) eval r1 (WindowFunctionExpr) r2 as NULL (Const) -Sort by salary -Scan empsalary And it is possible for window_tlist() to change upper's r1 to Var but delegating it to set_upper_references() seems right to do.If we simply delegate all the nodes to set_upper_reference(), r2 would also be evaluated in lower node, so we put null const in the lower and put actual evaluation in the upper. So window_preprocess and window_tlist do things needed for set_upper_reference to fix up all the expressions correctly, that's why no code is added to setrefs.c. This part is hard to describe in english (or even in my japanese) so if you are still lost, I'll add more explanation. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/20 Tom Lane t...@sss.pgh.pa.us: I've been studying the grammar for the windowing patch a bit. It seems to me that the existing window name option for window specification got left out. I think that WindowDef needs to have two name fields, one for the name (if any) defined by the window definition, and one for the referenced window name if any. Also the OVER name syntax seems like it maps to a referenced window name not a defined window name. I completely missed this issue. If the existing window name is allowed in window clause, then does it mean this is possible? SELECT row_number() OVER w2 FROM t WINDOW w1 AS (PARTITION BY grp), w2(w1) And what if w1 refers to w2 and w2 refers to w1 cyclicly? And from what I read the spec, it seems to me that it effects only frame clause which is unlikely implemented for 8.4, because if existing window name) is specified then partition clause and order by clause are both permitted in the window definition. I am also fairly strongly inclined to rip out all of the frame_clause syntax, since (1) it's unimplemented and unlikely to get implemented for 8.4, and (2) it's not particularly satisfactory anyway. The frame_bound_const business has to be rethought, because it's ugly and it doesn't even satisfy the limited requirements of the spec (cf general value specification which is an allowed alternative for unsigned value specification). It might be that we'll end up having to make BETWEEN be a fully reserved word in order to implement that syntax. The reason I added those grammer was 1) there had been possibilities to implement frame clause, and 2) to support not-implemented error. 1) is unlikey to be done and for 2) we need only rows/range syntax. So if it is annoying for 8.4 release, I don't disagree with your suggestion. BTW, I notice we also have not implemented the spec's bizarre RESPECT NULLS/IGNORE NULLS decoration for lead/lag or FROM FIRST/LAST for nth_value. This is fine with me, but we probably ought to document workarounds for that, if possible. I guess FROM FIRST/LAST could be handled by reversing the window sort order; what about the NULLS stuff? For function-specific syntax seems like more big discussion. I heard that array_agg() may have ORDER BY clause in its arguments but currently not included in pgsql, whereas extract() is supported as special case. I think we might build general rule for those and until it is completed, RESPECT NULLS/IGNORENULLS and others can be ignored. Of course it must be documented though. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/21 Hitoshi Harada umi.tan...@gmail.com: 2008/12/20 Tom Lane t...@sss.pgh.pa.us: I've been studying the grammar for the windowing patch a bit. It seems to me that the existing window name option for window specification got left out. I think that WindowDef needs to have two name fields, one for the name (if any) defined by the window definition, and one for the referenced window name if any. Also the OVER name syntax seems like it maps to a referenced window name not a defined window name. I completely missed this issue. If the existing window name is allowed in window clause, then does it mean this is possible? SELECT row_number() OVER w2 FROM t WINDOW w1 AS (PARTITION BY grp), w2(w1) And what if w1 refers to w2 and w2 refers to w1 cyclicly? And from what I read the spec, it seems to me that it effects only frame clause which is unlikely implemented for 8.4, because if existing window name) is specified then partition clause and order by clause are both permitted in the window definition. both not permitted in the window definition. Sorry for my mistake. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada umi.tan...@gmail.com writes: Here's the patch, including latest function default args. Regards, The comments added to planner.c contain various claims that setrefs.c will fix up window function references, but I see no code related to that in setrefs.c. Please explain. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
I've been studying the grammar for the windowing patch a bit. It seems to me that the existing window name option for window specification got left out. I think that WindowDef needs to have two name fields, one for the name (if any) defined by the window definition, and one for the referenced window name if any. Also the OVER name syntax seems like it maps to a referenced window name not a defined window name. I am also fairly strongly inclined to rip out all of the frame_clause syntax, since (1) it's unimplemented and unlikely to get implemented for 8.4, and (2) it's not particularly satisfactory anyway. The frame_bound_const business has to be rethought, because it's ugly and it doesn't even satisfy the limited requirements of the spec (cf general value specification which is an allowed alternative for unsigned value specification). It might be that we'll end up having to make BETWEEN be a fully reserved word in order to implement that syntax. BTW, I notice we also have not implemented the spec's bizarre RESPECT NULLS/IGNORE NULLS decoration for lead/lag or FROM FIRST/LAST for nth_value. This is fine with me, but we probably ought to document workarounds for that, if possible. I guess FROM FIRST/LAST could be handled by reversing the window sort order; what about the NULLS stuff? Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/8 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: That said, we should try to get this committed ASAP, so I think we can live without the trimming for 8.4. Just let me know. What is the current status... Is there something for me to do now? Or only wating? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/6 David Rowley [EMAIL PROTECTED]: I've spent last night and tonight trying to break the patch and I've not managed it. I spent 2 and a half hours on the train last night reading over the patch mainly for my own interest. I also went over the documentation and I have a few suggestions for improvement: +para + After literalWHERE/ and literalGROUP BY/ process, + rows might be windowed table, using the literalWINDOW/ + clause. +/para I think I know what you mean here. My re-write seems to have turned the sentence into a paragraph. Please tell me if I've assumed the meaning wrongly: After the literalWHERE/, literalGROUP BY/ and literalHAVING/ clauses one or more literalWINDOW/ clauses can be specified. This will allow window functions to be specified in the literalSELECT/ clause. These window functions can make use of the literalWINDOW/ clauses by making reference to the alias name of the window rather than explicitly specifying the properties of the window in each literalOVER/ clause. The WINDOW clause is a clause that starts with WINDOW, containing some window definitions, syntactically. So I rewrote it as: After the literalWHERE/, literalGROUP BY/ and literalHAVING/ clauses one or more window definitions can be specified by the literalWINDOW/ clause. This will allow window functions to be specified in the literalSELECT/ clause. These window functions can make use of the literalWINDOW/ clauses by making reference to the alias name of the window rather than explicitly specifying the properties of the window in each literalOVER/ clause. + Window functions are not placed in any of GROUP BY, HAVING and + WHERE clauses, which process values before any of the windows. If + there is need to qualify rows by the result of window functions, + whole of the query must be nested and append WHERE clause outer of + the current query. I think this one maybe needs an example to back it up. It's quite an important thing and I'm sure lots of people will need to do this. I'm not 100% happy with my new paragraph either but can't see how to word it any better. Window functions cannot be used in the WHERE, GROUP BY or HAVING clauses of the query. If there is a need to filter rows, group results or filter rows after aggregation takes place (HAVING) then the query must be nested. The query should contain the window functions in the inner query and apply the additional clauses that contain the results from the window function in the outer query, such as: SELECT depname, empno, salary, enroll_date FROM (SELECT depname, empno, salary, enroll_date, ROW_NUMBER() OVER (PARTITION BY depname ORDER BY salary,empno) AS pos FROM empsalary ) AS e WHERE pos = 1; In the above query the we're filtering and only showing the results from the inner query where the ROW_NUMBER() value is equal to 1. But of course the above query would be more simple using DISTINCT ON. Maybe there is a better example... My previous marathon getting the person in 2nd place might be better but that's introducing another previously unknown table to the manual. I use this query: SELECT depname, empno, salary, enroll_date FROM (SELECT depname, empno, salary, enroll_date, ROW_NUMBER() OVER (PARTITION BY depname ORDER BY salary,empno) AS pos FROM empsalary ) AS e WHERE pos 3; This isn't emulated by DISTINCT ON, is it? For all other issues, thanks, applied to my patch. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/6 David Rowley [EMAIL PROTECTED]: Hitoshi Harada Wrote: 2008/12/3 Hitoshi Harada [EMAIL PROTECTED]: I am randomly trying some issues instead of agg common code (which I now doubt if it's worth sharing the code), so tell me if you're restarting your hack again. I'll send the whole patch. Attached is the updated patch, including: - performance tuning up for large data sets - move find_wfunc to optimizer/util/clauses.c - rename WFunc to WindowFunc - rename WinDef to WindowDef - rename wfunc-pure_agg to winagg - remove winstate-tail_ptr - fix WinFrameGetArg in case that there are more than one peer, add relevant test - duplicate GetAggInitVal(), not sharing aggregate routines with nodeAgg.c I believe the remaining work is only to optimize row_number()/rank() cases to trim tuplestore like window aggregates. This will be done by providing some kind of way for each window functions to tell Window node that it doesn't require backward_scan. I've spent hours to try this issue, and concluded it doesn't pay. First, the test is on this query: explain analyze select id, row_number() OVER (order by id) from bigtable order by id; where bigtable has ~400MB, 1000 rows. This simple row_number() query takes about 50 sec, whereas without row_number() indexscan query takes about 25 sec. I wondered what makes the difference 25 sec. With this test, the tuplestore dumps its tuples since it never trims. Then I took profile of nodeWindow in some points, tuplestore_puttupleslot 13.6 sec spool_tuples37.9 sec eval_windowfunction 9.3 sec Note that spool_tuples contains execProc(outerPlan), which means its 37.9 sec contains outer indexscan 25 sec, plus tuplestore_puttuple, 13.9 sec. Then I modified some code to let tuplestore trim and tested again then the results were: tuplestore_puttupleslot 11.2 sec spool_tuples35.8 sec eval_windowfunction 9.5 sec It shows even though tuplestore trims its tuples and stays in memory rather than dumps them on files, the performance up is only 2 sec in 50 sec. So I concluded the optimization for row_number()/rank() etc doesn't pay for its more complexity in window function API. The bottleneck of the Window node origins from something else, like puttupleslot(), not Window node algorithm. Of course those issues should be tracked more precisely, for the window functions work I ignore them. It's been a long time since the CommitFest started. This patch has come a long way in that time. We've seen various performance improvements and many fixes where the patch was not matching the standard. We've also seen quite a big change in the window buffering technique which is showing amazing performance improvements in certain queries. I've spent many hours reading the standard and comparing the results from this patch against other RDBMS' that support window functions. I wonder if we're the first to implement NTH_VALUE()?. The patch, in my opinion is getting very close to being committable. The only outstanding issues; Please correct me where I'm wrong or have omitted something. * Fix still required for rank_up. Code still has a FIXME. This was whether rank() without ORDER BY clause should be valid or not. The answer is yes as it is implemented now, so I removed only comments. * ROW_NUMBER() and RANK() performance to be looked into. As I tested above. * Slight changes required in documentation (my previous email). Applied to the patch. * Final code read by someone reviewing the code. I am looking forward. I've also spent many hours trying to break this patch and in previous versions I've managed it. Last night I spent a few hours again testing this new patch and did not managed to find anything wrong. We're getting close to the time where the community can test further by committing this patch. Agree. I'll send the latest patch and finish my work for now. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/7 Hitoshi Harada [EMAIL PROTECTED]: 2008/12/7 Hitoshi Harada [EMAIL PROTECTED]: 2008/12/6 David Rowley [EMAIL PROTECTED]: the time where the community can test further by committing this patch. Agree. I'll send the latest patch and finish my work for now. Here's the patch, including latest function default args. Regards, I've added a link to the commitfest page and stated that the patch is ready for a core member to review. Good work. David. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: It shows even though tuplestore trims its tuples and stays in memory rather than dumps them on files, the performance up is only 2 sec in 50 sec. So I concluded the optimization for row_number()/rank() etc doesn't pay for its more complexity in window function API. The bottleneck of the Window node origins from something else, like puttupleslot(), not Window node algorithm. Of course those issues should be tracked more precisely, for the window functions work I ignore them. The negative impact of not trimming the tuplestore comes from having to do I/O to write the tuples to temporary file. I would've expected that to show up with 400 MB table, but maybe that still fits comfortably in OS cache. Anyway, I would expect there to be a big drop in performance after the tuplestore no longer fits in cache, and trimming it would eliminate that. That said, we should try to get this committed ASAP, so I think we can live without the trimming for 8.4. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: I tested the spool performance with David's earlier bigtable: CREATE TABLE bigtable ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL ); -- about 383MB of data INSERT INTO bigtable (timestamp) SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL FROM generate_series(1,1000); CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp); VACUUM ANALYZE bigtable; sample=# SELECT COUNT(*) FROM bigtable; count -- 1000 (1 row) sample=# SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1; lead 2008-12-02 00:15:10.288461 (1 row) sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1; QUERY PLAN -- --- Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039 rows=1 loops=1) - Window (cost=0.00..386612.13 rows=1000 width=12) (actual time=0.036..0.036 rows=1 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 w idth=12) (actual time=0.018..0.021 rows=2 loops=1) Total runtime: 0.071 ms (4 rows) shows quite good result. Great work. Amazing improvement! Old patch: david=# select timestamp,lag(timestamp,1) over (order by id) from bigtable order by id limit 1; timestamp | lag +- 2008-11-10 21:55:16.498458 | (1 row) Time: 105205.055 ms New patch: david=# select timestamp,lag(timestamp,1) over (order by id) from bigtable order by id limit 1; timestamp | lag +- 2008-12-04 22:05:22.687975 | (1 row) Time: 1.640 ms The following query works on my build: SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP BY depname; ERROR: variable not found in subplan target list This works fine on my build now too. Now, I am thinking about refactoring around aggregate common code, and renaming WFunc to WinFunc, which leads pg_proc.proiswfunc be pg_proc.proiswinfunc and so on if no objections come. I've spent last night and tonight trying to break the patch and I've not managed it. I spent 2 and a half hours on the train last night reading over the patch mainly for my own interest. I also went over the documentation and I have a few suggestions for improvement: My modifications to the lead and lag syntax can be improved. Currently the optional parameters make it look like DEFAULT can be specified without OFFSET. This is not the case: + typeany, [integer], [any]/type Should be: + typeany [,integer [,any] ]/type And: + lag(replaceable class=parametervalue/replaceable, [replaceable + class=parameteroffset/replaceable], [replaceable + class=parameterdefault/replaceable]) Should be: + lag(replaceable class=parametervalue/replaceable [, replaceable + class=parameteroffset/replaceable [,replaceable + class=parameterdefault/replaceable] ]) Same for LEAD() +para + After literalWHERE/ and literalGROUP BY/ process, + rows might be windowed table, using the literalWINDOW/ + clause. +/para I think I know what you mean here. My re-write seems to have turned the sentence into a paragraph. Please tell me if I've assumed the meaning wrongly: After the literalWHERE/, literalGROUP BY/ and literalHAVING/ clauses one or more literalWINDOW/ clauses can be specified. This will allow window functions to be specified in the literalSELECT/ clause. These window functions can make use of the literalWINDOW/ clauses by making reference to the alias name of the window rather than explicitly specifying the properties of the window in each literalOVER/ clause. +para + Another expample shows different capability of window functions + from above. Small typo: example instead of eapample +para + The same window definitions can be named and put togather into one + definition using xref linkend=queries-window. Small typo: together instead of togather. +para + Window functions doesn't accept DISTINCT and ALL syntax, even though + the function is an aggregate function. +/para Small grammar error: doesn't should be replaced with don't. Or perhaps change to: Window functions, unlike normal aggregate functions, do not allow DISTINCT or ALL to be used within the function argument list. + Window functions are not placed in any of GROUP BY, HAVING and + WHERE clauses, which process values before any of the windows. If + there is need to qualify rows by the result of window functions, + whole of the query must be nested and append WHERE clause outer of + the current query. I think this one maybe needs an example to back it up. It's quite an
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada Wrote: 2008/12/3 Hitoshi Harada [EMAIL PROTECTED]: I am randomly trying some issues instead of agg common code (which I now doubt if it's worth sharing the code), so tell me if you're restarting your hack again. I'll send the whole patch. Attached is the updated patch, including: - performance tuning up for large data sets - move find_wfunc to optimizer/util/clauses.c - rename WFunc to WindowFunc - rename WinDef to WindowDef - rename wfunc-pure_agg to winagg - remove winstate-tail_ptr - fix WinFrameGetArg in case that there are more than one peer, add relevant test - duplicate GetAggInitVal(), not sharing aggregate routines with nodeAgg.c I believe the remaining work is only to optimze row_number()/rank() cases to trim tuplestore like window aggregates. This will be done by providing some kind of way for each window functions to tell Window node that it doesn't require backward_scan. It's been a long time since the CommitFest started. This patch has come a long way in that time. We've seen various performance improvements and many fixes where the patch was not matching the standard. We've also seen quite a big change in the window buffering technique which is showing amazing performance improvements in certain queries. I've spent many hours reading the standard and comparing the results from this patch against other RDBMS' that support window functions. I wonder if we're the first to implement NTH_VALUE()?. The patch, in my opinion is getting very close to being committable. The only outstanding issues; Please correct me where I'm wrong or have omitted something. * Fix still required for rank_up. Code still has a FIXME. * ROW_NUMBER() and RANK() performance to be looked into. * Slight changes required in documentation (my previous email). * Final code read by someone reviewing the code. I've also spent many hours trying to break this patch and in previous versions I've managed it. Last night I spent a few hours again testing this new patch and did not managed to find anything wrong. We're getting close to the time where the community can test further by committing this patch. I'll make a reference to this post in the CommitFest list for any non-followers of this thread. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/2 Hitoshi Harada [EMAIL PROTECTED]: sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1; QUERY PLAN -- --- Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039 rows=1 loops=1) - Window (cost=0.00..386612.13 rows=1000 width=12) (actual time=0.036..0.036 rows=1 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 w idth=12) (actual time=0.018..0.021 rows=2 loops=1) Total runtime: 0.071 ms (4 rows) shows quite good result. Great work. After more playing with the new patch, I found worse results. sample=# explain analyze select id, row_number() OVER (order by id) from bigtable order by id; QUERY PLAN -- --- Window (cost=0.00..361612.13 rows=1000 width=4) (actual time=0.064..105414.522 rows=1000 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 width=4 ) (actual time=0.056..16836.341 rows=1000 loops=1) Total runtime: 114650.074 ms (3 rows) sample=# explain analyze select id,LAG(timestamp,1) over (order by id) from bigtable order by id; QUERY PLAN -- Window (cost=0.00..411612.13 rows=1000 width=12) (actual time=0.065..122583.331 rows=100 0 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 width=1 2) (actual time=0.056..18066.829 rows=1000 loops=1) Total runtime: 132770.399 ms (3 rows) The earlier patch results are here: http://archives.postgresql.org/pgsql-hackers/2008-11/msg01121.php row_number(): 44s/114s lag(): 79s/132s I don't understand the new patch totally, and I know the row_number() optimization is in progress, but even lag() is quite worse. Maybe tuplestore read pointer's heavy uses cause these. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/26 Heikki Linnakangas [EMAIL PROTECTED]: Hitoshi Harada wrote: I read more, and your spooling approach seems flexible for both now and the furture. Looking at only current release, the frame with ORDER BY is done by detecting peers in WinFrameGetArg() and add row number of peers to winobj-currentpos. Actually if we have capability to spool all rows we need on demand, the frame would be only a boundary problem. Yeah, we could do that. I'm afraid it would be pretty slow, though, if there's a lot of peers. That could probably be alleviated with some sort of caching, though. I added code for this issue. See http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=blobdiff;f=src/backend/executor/nodeWindow.c;h=f2144bf73a94829cd7a306c28064fa5454f8d369;hp=50a6d6ca4a26cd4854c445364395ed183b61f831;hb=895f1e615352dfc733643a701d1da3de7f91344b;hpb=843e34f341f0e824fd2cc0f909079ad943e3815b This process is very similar to your aggregatedupto in window aggregate, so they might be shared as general the way to detect frame boundary, aren't they? I am randomly trying some issues instead of agg common code (which I now doubt if it's worth sharing the code), so tell me if you're restarting your hack again. I'll send the whole patch. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
-Original Message- From: Heikki Linnakangas [mailto:[EMAIL PROTECTED] Sent: 26 November 2008 09:09 To: Hitoshi Harada Cc: David Rowley; pgsql-hackers@postgresql.org Subject: Re: Windowing Function Patch Review - Standard Conformance Hitoshi Harada wrote: 2008/11/26 David Rowley [EMAIL PROTECTED]: I'm at a bit of a loss to what to do now. Should I wait for your work Heikki? Or continue validating this patch? The best thing I can think to do right now is continue and any problems I find you can add regression tests for, then if we keep your regression tests for Heikki's changes then we can validate those changes more quickly. Any thoughts? Better ideas? Thanks to your great tests, we now know much more about specification and where to fail easily, so continuing makes sense but it may be good time to take a rest and wait for Heikki's patch completing. Here's another updated patch, including all your bug fixes. There's two known issues: - ranking functions still don't treat peer rows correctly. - I commented out the this function requires ORDER BY clause in the window test in rank_up, because a window function shouldn't be poking into the WindowState struct like that. I wonder if it's really needed? In section 7.11, the SQL2008 spec says if WD has no window ordering clause, then the window ordering is implementation-dependent, and *all rows are peers*. The regression test now fails because of this, but the current behavior actually seems correct to me. Thanks for the updated patch. I've been doing a little testing over the weekend. I'm running into a small problem with passing results from normal aggregate functions into the window function. Hitoshi and I saw this before: SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP BY depname; ERROR: variable not found in subplan target list Hitoshi fixed this in his 2008-11-12 patch, but it seems to have found its way back in. I was also reading over the standard tonight. I've discovered that the OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is not present. Oracle seems to support this. SQL2008 says: If lead or lag function is specified, then: i) Let VE1 be lead or lag extent and let DT be the declared type of VE1. ii) Case: Scalar expressions 209 WD 9075-2:200w(E) 6.10 window function If offset is specified, then let OFF be offset. The declared type of OFF shall be an exact numeric type with scale 0 (zero). 1) 2) Otherwise, let OFF be 1 (one). Yet another variant of LEAD() and LAG() but I think well worth it for both compliance to the standard and compatibility to Oracle. I've also been thinking about the on-demand buffering for the window functions that you talked about. In some of my previous performance tests I noticed that LEAD with a LIMIT clause is not optimal as it will store all the rows in the tuplestore before applying the LIMIT: select timestamp,lead(timestamp,1) over (order by id) from bigtable order by id limit 1; Is there any way for the on-demand buffering to make the above query faster? Really only the first 2 rows ordered by id need to be read. I had previously thought this would be too difficult to implement as the OFFSET can be variable, but maybe it's possible on-demand. Yet I'd imagine it's difficult to know when to allow rows to exit the tuplestore as a LAG(col, someVariableValue) might need those rows later. My next task is to read more of the standard just in case there is anything else we should know. David. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
I wrote: I was also reading over the standard tonight. I've discovered that the OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is not present. Oracle seems to support this. SQL2008 says: If lead or lag function is specified, then: i) Let VE1 be lead or lag extent and let DT be the declared type of VE1. ii) Case: Scalar expressions 209 WD 9075-2:200w(E) 6.10 window function If offset is specified, then let OFF be offset. The declared type of OFF shall be an exact numeric type with scale 0 (zero). 1) 2) Otherwise, let OFF be 1 (one). Yet another variant of LEAD() and LAG() but I think well worth it for both compliance to the standard and compatibility to Oracle. I figured this was quite simple so I've created a patch to implement this. Can probably put this down to the fact that I'm starting to feel bad about pointing out the mistakes and having someone else fix them. Figured it was time to make some changes myself. I've got limited experience with diff so please let me know if there is something wrong with the patch. Same goes for my changes to the code. I re-sequenced the OIDs of other window functions so it will require initdb. Also I made some updates to the documentation. Wasn't 100% sure on the syntax for the optional arguments there. Hitoshi had: arg1 [,optional1]. I've changed this to arg, [optional1], [optional2]. One thing I didn't do was update the regression test: SELECT oid, proname FROM pg_proc WHERE proiswfunc; Hopefully this patch will apply after applying Heikki's latest patch (version 3). If you're happy with this Heikki can you merge to your patch? David windowfunc_nooffset1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
I wrote: Hmm, did you apply the latest patch correctly? My build can produce right results, so I don't see why it isn't fixed. Make sure the lines around 2420-2430 in planner.c like: /* * must copyObject() to avoid args concatenating with each other. */ pulled_exprs = list_concat(pulled_exprs, copyObject(wfunc-args)); where copyObject() is added. I'm sitting here away from home with a funny feeling I forgot to make install. I think my home adsl has dropped out so can't confirm that. If it works for you and not for me last night then I probably did forget. I'll let you know. Sorry, false alarm. What can I say, it was late. I did make install, just to the wrong place, so ended up using the old binary. Sorry for panic. The bug is fixed. Only I still get regression failures on window. Do they all pass for you? I have most of the weekend to test Heikki's work, in fact it's building now. But I'll have to leave it running... David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
David Rowley wrote: I've created a query that uses the table in your regression test. max_salary1 gives incorrect results. If you remove the max_salary2 column it gives the correct results. Thanks. I saw this myself yesterday, while hacking on the patch. I thought it was a bug I had introduced, but apparently it was there all along. Anyway, fixed in the latest version I will send shortly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/26 Heikki Linnakangas [EMAIL PROTECTED]: Hitoshi Harada wrote: 2008/11/26 David Rowley [EMAIL PROTECTED]: I'm at a bit of a loss to what to do now. Should I wait for your work Heikki? Or continue validating this patch? The best thing I can think to do right now is continue and any problems I find you can add regression tests for, then if we keep your regression tests for Heikki's changes then we can validate those changes more quickly. Any thoughts? Better ideas? Thanks to your great tests, we now know much more about specification and where to fail easily, so continuing makes sense but it may be good time to take a rest and wait for Heikki's patch completing. Here's another updated patch, including all your bug fixes. There's two known issues: - ranking functions still don't treat peer rows correctly. - I commented out the this function requires ORDER BY clause in the window test in rank_up, because a window function shouldn't be poking into the WindowState struct like that. I wonder if it's really needed? In section 7.11, the SQL2008 spec says if WD has no window ordering clause, then the window ordering is implementation-dependent, and *all rows are peers*. The regression test now fails because of this, but the current behavior actually seems correct to me. Yes, I was wrong. The reason I put the error in rank() without ORDER BY is nothing but I didn't find it. It is actually a reasonable specification, isn't it. This is tiny thing, but negative transition function can be called inverse transition function? I feel the latter is more readable. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/26 Heikki Linnakangas [EMAIL PROTECTED]: Hitoshi Harada wrote: I read more, and your spooling approach seems flexible for both now and the furture. Looking at only current release, the frame with ORDER BY is done by detecting peers in WinFrameGetArg() and add row number of peers to winobj-currentpos. Actually if we have capability to spool all rows we need on demand, the frame would be only a boundary problem. Yeah, we could do that. I'm afraid it would be pretty slow, though, if there's a lot of peers. That could probably be alleviated with some sort of caching, though. It seems to me that eval_windowaggregate() also should use frame APIs. Only things we have to care is the shrinking frame, which is not supported in this release. So I'd suggest winobj-aggregatedupto to be removed. Is there objection? Actually, I took a different approach in the latest patch. Window aggregates now use a separate read pointer into the tuplestore. The current row is also read using a separate read pointer in ExecWindow. The aggregate and current row read pointers don't need random access, which has the nice effect that if you only use window aggregates, not window functions, the tuplestore doesn't need random access, and doesn't need to spill to disk as long as the window frame fits in work_mem. We should still figure out how to make it possible to trim the tuplestore, when window functions that don't need random access are used. Like ROW_NUMBER and RANK. Earlier, I thought we should add function to the window object API to explicitly tell that tuples before row X are no longer needed. But I'm now starting to wonder if we should design the window object API more like the tuplestore API, with a read pointer that you can advance forward or backward, and rewind. That would probably map more naturally to the underlying tuplestore, and it seems like it would be just as easy to use in all the existing window functions. Complete solution, at least for the current release. I now figure out exactly what the tuplestore_trim does. So currentrow pointer doesn't need go backward, neither does extending frame's aggregate pointer, row_number, rank, etc. Cutting off frame's aggregate need random row, so does lead, lag, etc. Even there were random access, it's very flexible in triming and saving memory. Little concern is some operations like WinRowIsPeer() doesn't know if the required row is trimmed already, which isn't big problem in the existing functions. Now you might think about sharing aggregate code like initialize/advance/finalize. If you want I can refactor in nodeAgg.c to be able sharing with nodeWindow.c, which wouldn't conflict with your work. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: I read more, and your spooling approach seems flexible for both now and the furture. Looking at only current release, the frame with ORDER BY is done by detecting peers in WinFrameGetArg() and add row number of peers to winobj-currentpos. Actually if we have capability to spool all rows we need on demand, the frame would be only a boundary problem. Yeah, we could do that. I'm afraid it would be pretty slow, though, if there's a lot of peers. That could probably be alleviated with some sort of caching, though. It seems to me that eval_windowaggregate() also should use frame APIs. Only things we have to care is the shrinking frame, which is not supported in this release. So I'd suggest winobj-aggregatedupto to be removed. Is there objection? Actually, I took a different approach in the latest patch. Window aggregates now use a separate read pointer into the tuplestore. The current row is also read using a separate read pointer in ExecWindow. The aggregate and current row read pointers don't need random access, which has the nice effect that if you only use window aggregates, not window functions, the tuplestore doesn't need random access, and doesn't need to spill to disk as long as the window frame fits in work_mem. We should still figure out how to make it possible to trim the tuplestore, when window functions that don't need random access are used. Like ROW_NUMBER and RANK. Earlier, I thought we should add function to the window object API to explicitly tell that tuples before row X are no longer needed. But I'm now starting to wonder if we should design the window object API more like the tuplestore API, with a read pointer that you can advance forward or backward, and rewind. That would probably map more naturally to the underlying tuplestore, and it seems like it would be just as easy to use in all the existing window functions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Heikki Linnakangas [EMAIL PROTECTED] writes: Here's another updated patch, including all your bug fixes. I did a very fast pass through this and have a few comments. * Don't bother manually updating keywords.sgml. As stated therein, that table is automatically generated. All you'll accomplish is to cause merge problems. (The effort would be a lot better spent on the non-boilerplate parts of the docs anyway.) * I assume there's going to be some way to create user-defined window functions? * It seems fairly unlikely that you can get away with not supporting any qual expression on a Window plan node. What will you do with HAVING qualifiers? * The find_aggref code added to planagg.c (where it doesn't belong anyway) doesn't seem to be used anywhere. * In the same vein, I'm unimpressed with moving GetAggInitVal into execGrouping.c, which it isn't at all related to. I'd just leave it alone and duplicate the code in nodeWindow.c --- it's not exactly large. If you did insist on sharing this code it would be appropriate to change the name and comments to reflect the fact that it's being used for more than just aggregates, anyhow. * And in the same vein. var.c is hardly the place to put a search-for-wfuncs routine. * It seems like a coin was flipped to determine whether struct and field names would use window, win, or just w (I find WFunc to be particularly unhelpful to a reader who doesn't already know what it is). Please try to reduce the surprise factor. I'd suggest consistently using window in type names, though win is an OK prefix for field names within window-related structs. * This is a bad idea: /* + * OrderClause - + * representation of ORDER BY in Window + */ + typedef SortGroupClause OrderClause; + + + /* + * PartitionClause - + * representaition of PATITION BY in Window + */ + typedef SortGroupClause PartitionClause; If they're just SortGroupClauses, call them that, don't invent an alias. (Yes, I know SortClause and GroupClause used to be aliases. That was a bad idea: it confused matters and required lots of useless duplicated code, except for the places where we didn't duplicate code because we were willing to assume struct equivalence. There's basically just nothing that wins about that approach.) In any case, order and partition are really bad names to be using here given the number of possible other meanings for those terms in a DBMS context. If you actually need separate struct types then names like WindowPartitionClause would be appropriate. * The API changes chosen for func_get_detail seem pretty bizarre. Why didn't you just add a new return code FUNCDETAIL_WINDOW? * The node support needs to be gone over more closely. I noticed just in random checking that WFunc is missing parse-location support in nodeFuncs.c and the Query.hasWindow field got added to copyfuncs, outfuncs, readfuncs, but not equalfuncs. * Please heed the comment at the top of parallel_schedule about the max number of tests per parallel group. * I don't find the test added to opr_sanity.sql to be particularly sane. We *will* have the ability to add window functions. But it might be helpful to check that proisagg and proiswfunc aren't both set. * errcodes.h is not the only place that has to be touched to add a new error code --- see also sgml/errcodes.sgml, plpgsql/src/plerrcodes.h. And what is your precedent for using 42813? I don't see that in the standard. If it's coming from DB2 it's okay, otherwise we should be using a private 'P' code. * Please try to eliminate random whitespace changes from the patch ... *particularly* in files that otherwise wouldn't be touched at all (there are at least two cases of that in this patch) That's all I have time for right now ... more to come no doubt. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
On 26/11/2008, Hitoshi Harada [EMAIL PROTECTED] wrote: 2008/11/26 David Rowley [EMAIL PROTECTED]: Hitoshi Harada wrote: 2008/11/20 David Rowley [EMAIL PROTECTED]: -- The following query gives incorrect results on the -- maxhighbid column SELECT auctionid, category, description, highestbid, reserve, MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid, MAX(reserve) OVER() AS highest_reserve FROM auction_items; If you remove the highest_reserve column you get the correct results. The bug also affects MIN, AVG, COUNT, STDDEV but not SUM. Good report! I fixed this bug, which was by a trival missuse of list_concat() in the planner. This was occurred when the first aggregate trans func is strict and the second aggregate argument may be null. Yep, the argument of the second was implicitly passed to the first wrongly. That's why it didn't occur if you delete the second MAX(). I added a test case with the existing data emulating yours (named strict aggs) but if it is wrong, let me know. It's not quite right yet. I'm also getting regression tests failing on window. Let me know if you want the diffs. I've created a query that uses the table in your regression test. max_salary1 gives incorrect results. If you remove the max_salary2 column it gives the correct results. Please excuse the lack of sanity with the query. I had to do it this way to get 2 columns with NULLs. SELECT depname, empno, salary, salary1, salary2, MAX(salary1) OVER (ORDER BY empno) AS max_salary1, MAX(salary2) OVER() AS max_salary2 FROM (SELECT depname, empno, salary, (CASE WHEN salary 5000 THEN NULL ELSE salary END) AS salary1, (CASE WHEN salary = 5000 THEN NULL ELSE salary END) AS salary2 FROM empsalary ) empsalary; Actual results: depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2 ---+---++-+-+-+- sales | 1 | 5000 |5000 | | |4800 personnel | 2 | 3900 | |3900 | |4800 sales | 3 | 4800 | |4800 | |4800 sales | 4 | 4800 | |4800 | |4800 personnel | 5 | 3500 | |3500 | |4800 develop | 7 | 4200 | |4200 | |4800 develop | 8 | 6000 |6000 | | |4800 develop | 9 | 4500 | |4500 | |4800 develop |10 | 5200 |5200 | | |4800 develop |11 | 5200 |5200 | | |4800 Correct results: depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2 ---+---++-+-+-+- sales | 1 | 5000 |5000 | |5000 |4800 personnel | 2 | 3900 | |3900 |5000 |4800 sales | 3 | 4800 | |4800 |5000 |4800 sales | 4 | 4800 | |4800 |5000 |4800 personnel | 5 | 3500 | |3500 |5000 |4800 develop | 7 | 4200 | |4200 |5000 |4800 develop | 8 | 6000 |6000 | |6000 |4800 develop | 9 | 4500 | |4500 |6000 |4800 develop |10 | 5200 |5200 | |6000 |4800 develop |11 | 5200 |5200 | |6000 |4800 This might be a good regression test once it's fixed. Hmm, did you apply the latest patch correctly? My build can produce right results, so I don't see why it isn't fixed. Make sure the lines around 2420-2430 in planner.c like: /* * must copyObject() to avoid args concatenating with each other. */ pulled_exprs = list_concat(pulled_exprs, copyObject(wfunc-args)); where copyObject() is added. I'm sitting here away from home with a funny feeling I forgot to make install. I think my home adsl has dropped out so can't confirm that. If it works for you and not for me last night then I probably did forget. I'll let you know. I'm not sure if this is related, another bug is found: *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *** *** 2246,2251 equal(void *a, void *b) --- 2246,2252 break; case T_Aggref: retval = _equalAggref(a, b); +break;
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/27 Tom Lane [EMAIL PROTECTED]: Heikki Linnakangas [EMAIL PROTECTED] writes: Here's another updated patch, including all your bug fixes. I did a very fast pass through this and have a few comments. Thanks for your comments. The executor part is now being refactored by Heikki, but remaining are still on me. Note that some of those are because of my earlier poor understanding. * Don't bother manually updating keywords.sgml. As stated therein, that table is automatically generated. All you'll accomplish is to cause merge problems. (The effort would be a lot better spent on the non-boilerplate parts of the docs anyway.) OK, I intend nothing here but didn't know the rule. will remove it. * I assume there's going to be some way to create user-defined window functions? Yes, but for 8.4 no. The window functions will need specific window function APIs rather than existing PG_XXX APIs and the design of them affects how to design the Window executor node. So we are currently desgining the APIs. If it completes in the future users can create their own functions. * It seems fairly unlikely that you can get away with not supporting any qual expression on a Window plan node. What will you do with HAVING qualifiers? Window nodes are executed after any of WHERE, GROUP BY, HAVING, and before ORDER BY. Window nodes don't have qual and HAVING doesn't give any effect to Window operations. * The find_aggref code added to planagg.c (where it doesn't belong anyway) doesn't seem to be used anywhere. It was needed to extract Aggref node in planner once, but not needed anymore. will remove it. * In the same vein, I'm unimpressed with moving GetAggInitVal into execGrouping.c, which it isn't at all related to. I'd just leave it alone and duplicate the code in nodeWindow.c --- it's not exactly large. If you did insist on sharing this code it would be appropriate to change the name and comments to reflect the fact that it's being used for more than just aggregates, anyhow. It is now in the discussion. Since nodeWindow has much duplicated code in initialize/advance/finalize so we wonder if those codes should be shared among the two nodes. If so, GetAggInitVal seems to be shared as well as other aggregate specific code. If we decide to separate them, your suggestion that GetAggInitVal should be duplicated will be sane. * And in the same vein. var.c is hardly the place to put a search-for-wfuncs routine. Agreed, but where to go? clause.c may be, or under parser/ ? * It seems like a coin was flipped to determine whether struct and field names would use window, win, or just w (I find WFunc to be particularly unhelpful to a reader who doesn't already know what it is). Please try to reduce the surprise factor. I'd suggest consistently using window in type names, though win is an OK prefix for field names within window-related structs. I named WFunc as WinFunc once, but sounds too long for such heavily used node. I liked it like Agg, but Win is not appropriate neither is Func. And also, its name is consistent with the added pg_proc column named proiswfunc. I wonder it would be proiswinfunc if we rename WFunc as WinFunc. * This is a bad idea: /* + * OrderClause - + * representation of ORDER BY in Window + */ + typedef SortGroupClause OrderClause; + + + /* + * PartitionClause - + * representaition of PATITION BY in Window + */ + typedef SortGroupClause PartitionClause; If they're just SortGroupClauses, call them that, don't invent an alias. (Yes, I know SortClause and GroupClause used to be aliases. That was a bad idea: it confused matters and required lots of useless duplicated code, except for the places where we didn't duplicate code because we were willing to assume struct equivalence. There's basically just nothing that wins about that approach.) In any case, order and partition are really bad names to be using here given the number of possible other meanings for those terms in a DBMS context. If you actually need separate struct types then names like WindowPartitionClause would be appropriate. This is because I didn't know quite well about windowed table specification earlier (and when I was started the Group and the Sort was separated as you point). And now I can tell the two nodes can be named SortGroupClause, nothing special. * The API changes chosen for func_get_detail seem pretty bizarre. Why didn't you just add a new return code FUNCDETAIL_WINDOW? An aggregate that is existing currently can be used as a window function. But we need to treat it as specialized case. A normal aggregate without OVER clause is GROUP BY aggregate and with OVER clause it's window aggregate. For func_get_detail to determine which aggregate windef variable must be passed. Is it better? And also, block starting with Oops. Time to die comment in ParseFuncOrColumn can be shared among two types. So I thought the two are similar and func_get_detail
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: 2008/11/20 David Rowley [EMAIL PROTECTED]: -- The following query gives incorrect results on the -- maxhighbid column SELECT auctionid, category, description, highestbid, reserve, MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid, MAX(reserve) OVER() AS highest_reserve FROM auction_items; If you remove the highest_reserve column you get the correct results. The bug also affects MIN, AVG, COUNT, STDDEV but not SUM. Good report! I fixed this bug, which was by a trival missuse of list_concat() in the planner. This was occurred when the first aggregate trans func is strict and the second aggregate argument may be null. Yep, the argument of the second was implicitly passed to the first wrongly. That's why it didn't occur if you delete the second MAX(). I added a test case with the existing data emulating yours (named strict aggs) but if it is wrong, let me know. It's not quite right yet. I'm also getting regression tests failing on window. Let me know if you want the diffs. I've created a query that uses the table in your regression test. max_salary1 gives incorrect results. If you remove the max_salary2 column it gives the correct results. Please excuse the lack of sanity with the query. I had to do it this way to get 2 columns with NULLs. SELECT depname, empno, salary, salary1, salary2, MAX(salary1) OVER (ORDER BY empno) AS max_salary1, MAX(salary2) OVER() AS max_salary2 FROM (SELECT depname, empno, salary, (CASE WHEN salary 5000 THEN NULL ELSE salary END) AS salary1, (CASE WHEN salary = 5000 THEN NULL ELSE salary END) AS salary2 FROM empsalary ) empsalary; Actual results: depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2 ---+---++-+-+-+- sales | 1 | 5000 |5000 | | |4800 personnel | 2 | 3900 | |3900 | |4800 sales | 3 | 4800 | |4800 | |4800 sales | 4 | 4800 | |4800 | |4800 personnel | 5 | 3500 | |3500 | |4800 develop | 7 | 4200 | |4200 | |4800 develop | 8 | 6000 |6000 | | |4800 develop | 9 | 4500 | |4500 | |4800 develop |10 | 5200 |5200 | | |4800 develop |11 | 5200 |5200 | | |4800 Correct results: depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2 ---+---++-+-+-+- sales | 1 | 5000 |5000 | |5000 |4800 personnel | 2 | 3900 | |3900 |5000 |4800 sales | 3 | 4800 | |4800 |5000 |4800 sales | 4 | 4800 | |4800 |5000 |4800 personnel | 5 | 3500 | |3500 |5000 |4800 develop | 7 | 4200 | |4200 |5000 |4800 develop | 8 | 6000 |6000 | |6000 |4800 develop | 9 | 4500 | |4500 |6000 |4800 develop |10 | 5200 |5200 | |6000 |4800 develop |11 | 5200 |5200 | |6000 |4800 This might be a good regression test once it's fixed. I'm at a bit of a loss to what to do now. Should I wait for your work Heikki? Or continue validating this patch? The best thing I can think to do right now is continue and any problems I find you can add regression tests for, then if we keep your regression tests for Heikki's changes then we can validate those changes more quickly. Any thoughts? Better ideas? David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/26 David Rowley [EMAIL PROTECTED]: Hitoshi Harada wrote: 2008/11/20 David Rowley [EMAIL PROTECTED]: -- The following query gives incorrect results on the -- maxhighbid column SELECT auctionid, category, description, highestbid, reserve, MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid, MAX(reserve) OVER() AS highest_reserve FROM auction_items; If you remove the highest_reserve column you get the correct results. The bug also affects MIN, AVG, COUNT, STDDEV but not SUM. Good report! I fixed this bug, which was by a trival missuse of list_concat() in the planner. This was occurred when the first aggregate trans func is strict and the second aggregate argument may be null. Yep, the argument of the second was implicitly passed to the first wrongly. That's why it didn't occur if you delete the second MAX(). I added a test case with the existing data emulating yours (named strict aggs) but if it is wrong, let me know. It's not quite right yet. I'm also getting regression tests failing on window. Let me know if you want the diffs. I've created a query that uses the table in your regression test. max_salary1 gives incorrect results. If you remove the max_salary2 column it gives the correct results. Please excuse the lack of sanity with the query. I had to do it this way to get 2 columns with NULLs. SELECT depname, empno, salary, salary1, salary2, MAX(salary1) OVER (ORDER BY empno) AS max_salary1, MAX(salary2) OVER() AS max_salary2 FROM (SELECT depname, empno, salary, (CASE WHEN salary 5000 THEN NULL ELSE salary END) AS salary1, (CASE WHEN salary = 5000 THEN NULL ELSE salary END) AS salary2 FROM empsalary ) empsalary; Actual results: depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2 ---+---++-+-+-+- sales | 1 | 5000 |5000 | | |4800 personnel | 2 | 3900 | |3900 | |4800 sales | 3 | 4800 | |4800 | |4800 sales | 4 | 4800 | |4800 | |4800 personnel | 5 | 3500 | |3500 | |4800 develop | 7 | 4200 | |4200 | |4800 develop | 8 | 6000 |6000 | | |4800 develop | 9 | 4500 | |4500 | |4800 develop |10 | 5200 |5200 | | |4800 develop |11 | 5200 |5200 | | |4800 Correct results: depname | empno | salary | salary1 | salary2 | max_salary1 | max_salary2 ---+---++-+-+-+- sales | 1 | 5000 |5000 | |5000 |4800 personnel | 2 | 3900 | |3900 |5000 |4800 sales | 3 | 4800 | |4800 |5000 |4800 sales | 4 | 4800 | |4800 |5000 |4800 personnel | 5 | 3500 | |3500 |5000 |4800 develop | 7 | 4200 | |4200 |5000 |4800 develop | 8 | 6000 |6000 | |6000 |4800 develop | 9 | 4500 | |4500 |6000 |4800 develop |10 | 5200 |5200 | |6000 |4800 develop |11 | 5200 |5200 | |6000 |4800 This might be a good regression test once it's fixed. Hmm, did you apply the latest patch correctly? My build can produce right results, so I don't see why it isn't fixed. Make sure the lines around 2420-2430 in planner.c like: /* * must copyObject() to avoid args concatenating with each other. */ pulled_exprs = list_concat(pulled_exprs, copyObject(wfunc-args)); where copyObject() is added. I'm not sure if this is related, another bug is found: *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *** *** 2246,2251 equal(void *a, void *b) --- 2246,2252 break; case T_Aggref: retval = _equalAggref(a, b); +break; case T_WFunc: retval = _equalWFunc(a, b); break; don't laugh at... could you try it and test again? If whole the new patch is needed, tell me then will send it. I'm at a bit of a loss to what to do now. Should I wait for your work Heikki? Or continue validating this patch? The best thing I can think to do right now is continue
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/25 Hitoshi Harada [EMAIL PROTECTED]: 2008/11/25 Heikki Linnakangas [EMAIL PROTECTED]: Here's an updated patch, where the rows are fetched on-demand. Good! And I like the fetching args by number better. Let me take more time to look in detail... I read more, and your spooling approach seems flexible for both now and the furture. Looking at only current release, the frame with ORDER BY is done by detecting peers in WinFrameGetArg() and add row number of peers to winobj-currentpos. Actually if we have capability to spool all rows we need on demand, the frame would be only a boundary problem. It seems to me that eval_windowaggregate() also should use frame APIs. Only things we have to care is the shrinking frame, which is not supported in this release. So I'd suggest winobj-aggregatedupto to be removed. Is there objection? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: 2008/11/20 David Rowley [EMAIL PROTECTED]: I won't be around until Monday evening (Tuesday morning JST). I'll pickup the review again there. It's really time for me to push on with this review. The patch is getting closer to getting the thumbs up from me. I'm really hoping that can happen on Monday. Then it's over to Heikki for more code feedback. This time I only fixed some bugs and rebased against the HEAD, but did not refactored. I can figure out some part of what Heikki claimed but not all, so I'd like to see what he will send and post another patch if needed. Thanks! Here's what I've got this far I merged your latest patch into this, as well as latest changes from CVS HEAD. It's a bit of a mess, but here's the big changes I've done: - Removed all the iterator stuff. You can just call WinGetPart/FrameGetArg repeatedly. It ought to be just as fast if done right in nodeWindow.c. - Removed all the Shrinked/Extended stuff, as it's not needed until we have full support for window frames. - Removed all the WinRow* functions, you can just call WinFrame/PartGet* functions, using WINDOW_SEEK_CURRENT - Removed WinFrame/PartGetTuple functions. They were only used for determining if two tuples are peer with each other, so I added a WinRowIsPeer function instead. - Removed all the buffering strategy stuff. Currently, the whole partition is always materialized. That's of course not optimal; I'm still thinking we should just read the tuples from the outer node lazily, on-demand, instead. To avoid keeping all tuples in the partition in tuplestore, when not needed, should add an extra function to trim the tuplestore. There's now a lot less code in nodeWindow.c, and I'm starting to understand most of what-s left :-). TODO - clean up the comments and other mess. - Modify WinPart/FrameGetArg so that it's passed the parameter number instead of an Expr node. I'll continue working on the executor, but please let me know what you think. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/24 Heikki Linnakangas [EMAIL PROTECTED]: Hitoshi Harada wrote: 2008/11/20 David Rowley [EMAIL PROTECTED]: I won't be around until Monday evening (Tuesday morning JST). I'll pickup the review again there. It's really time for me to push on with this review. The patch is getting closer to getting the thumbs up from me. I'm really hoping that can happen on Monday. Then it's over to Heikki for more code feedback. This time I only fixed some bugs and rebased against the HEAD, but did not refactored. I can figure out some part of what Heikki claimed but not all, so I'd like to see what he will send and post another patch if needed. Thanks! Here's what I've got this far I merged your latest patch into this, as well as latest changes from CVS HEAD. It's a bit of a mess, but here's the big changes I've done: - Removed all the iterator stuff. You can just call WinGetPart/FrameGetArg repeatedly. It ought to be just as fast if done right in nodeWindow.c. - Removed all the Shrinked/Extended stuff, as it's not needed until we have full support for window frames. - Removed all the WinRow* functions, you can just call WinFrame/PartGet* functions, using WINDOW_SEEK_CURRENT - Removed WinFrame/PartGetTuple functions. They were only used for determining if two tuples are peer with each other, so I added a WinRowIsPeer function instead. - Removed all the buffering strategy stuff. Currently, the whole partition is always materialized. That's of course not optimal; I'm still thinking we should just read the tuples from the outer node lazily, on-demand, instead. To avoid keeping all tuples in the partition in tuplestore, when not needed, should add an extra function to trim the tuplestore. There's now a lot less code in nodeWindow.c, and I'm starting to understand most of what-s left :-). TODO - clean up the comments and other mess. - Modify WinPart/FrameGetArg so that it's passed the parameter number instead of an Expr node. I'll continue working on the executor, but please let me know what you think. It is amazing changes and I like it. Actually I wanted to get it slimer but hadn't found the way. two points, frame concept and window_gettupleslot If you keep this in your mind, please don't be annoyed but the current frame concept is wrong. -- yours sample=# select depname, empno, salary, last_value(empno) over(order by salary) from empsalary; depname | empno | salary | last_value ---+---++ personnel | 5 | 3500 | 5 personnel | 2 | 3900 | 2 develop | 7 | 4200 | 7 develop | 9 | 4500 | 9 sales | 4 | 4800 | 4 sales | 3 | 4800 | 3 sales |b1 | 5000 | 1 develop |11 | 5200 | 11 develop |10 | 5200 | 10 develop | 8 | 6000 | 8 (10 rows) -- mine sample=# select depname, empno, salary, last_value(empno) over(order by salary) from empsalary; depname | empno | salary | last_value ---+---++ personnel | 5 | 3500 | 5 personnel | 2 | 3900 | 2 develop | 7 | 4200 | 7 develop | 9 | 4500 | 9 sales | 4 | 4800 | 3 sales | 3 | 4800 | 3 sales | 1 | 5000 | 1 develop |11 | 5200 | 10 develop |10 | 5200 | 10 develop | 8 | 6000 | 8 (10 rows) Note that on empno=4 then last_value=4(yours)/3(mine), which means the frame is applied to last_value() as well as nth_value() and first_value(). Not only to aggregates. So these lines in nodeWindow.c, /* * If there is an ORDER BY (we don't support other window frame * specifications yet), the frame runs from first row of the partition * to the current row. Otherwise the frame is the whole partition. */ if (((Window *) winobj-winstate-ss.ps.plan)-ordNumCols == 0) max_pos = winobj-partition_rows - 1; else max_pos = winobj-currentpos; max_pos is bogus. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means max_pos would be currentpos + rows_peers. I looked over the patch and aggregates care it with winobj-aggregatedupto but what about other window functions? Then window_gettupleslot looks like killing performance in some cases. What if the partition has millions of rows and wfunc1 seeks the head and wfunc2 seeks the tail? So as you point it is possible to define frame and partition while scanning current rows rather than before scanning, I felt it'd cost more. But I know this is work in progress, so you probably think about the solutions. What do you think? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/25 Heikki Linnakangas [EMAIL PROTECTED]: Hitoshi Harada wrote: If you keep this in your mind, please don't be annoyed but the current frame concept is wrong. ... Note that on empno=4 then last_value=4(yours)/3(mine), which means the frame is applied to last_value() as well as nth_value() and first_value(). Not only to aggregates. So these lines in nodeWindow.c, Yes, you're right. Hmm, I wonder if we should implement first_value, last_value and nth_value as regular aggregates. That way, all the window functions would operate on the partition, not caring about the frame, and all the aggregates would operate on the frame. No way. Current specification that we don't get other frames than with/without ORDER BY doesn't have conflict with your proposal. However, thinking about the future, we decided to add window aggregates with wfunc='t', with subfunc for shrinking frame performance up. The direction we should head for is convert aggregates to subset of window functions, not vice versa. Then window_gettupleslot looks like killing performance in some cases. What if the partition has millions of rows and wfunc1 seeks the head and wfunc2 seeks the tail? Yeah, we probably should do something about that. You used several different read pointers to the tuplestore in your patch, one for frame head, another for frame tail, another for partition head etc., but I think that potentially suffers from the same problem. Perhaps we should have one read pointer for each window function, plus one reading the current row? It seems likely that one window function is not going to hop around wildly, and the behavior wouldn't depend so much on what other window functions are being used. Yes. My patch also has performance problem in seeking but is better than one read pointer. So I like your proposal to attach a read pointer with each wfuncs. I am not sure what kind of window functions the user will define in the future, but for current use cases it comes reasonable. Anyway, only one read pointer will make problems, I guess. So as you point it is possible to define frame and partition while scanning current rows rather than before scanning, I felt it'd cost more. But I know this is work in progress, so you probably think about the solutions. What do you think? Here's an updated patch, where the rows are fetched on-demand. Good! And I like the fetching args by number better. Let me take more time to look in detail... Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
I wrote: All, This is my first patch review for PostgreSQL. I did submit a patch last commit fest (Boyer-Moore) so I feel I should review one this commit fest. I'm quite new to PostgreSQL so please don't rely on me totally. I'll do my best. Heikki is also reviewing this patch which makes me feel better. My aim is to get the author has much feed back as quickly as possible. For this reason I'm going to be breaking down my reviews into the following topics. 1. Does patch apply cleanly? 2. Any compiler warnings? 3. Do the results follow the SQL standard? 4. Performance Comparison, does it perform better than alternate ways of doing things. Self joins, sub queries etc. 5. Performance, no comparison. How does it perform with larger tables? I regret starting the individual thread for each function now. I was expecting them to get longer. So I'll use this one for the remainder of the standard conformance tests. I covered all of the non-aggregate functions in previous tests. I will do more final tests on them soon. Tonight I started testing the aggregate functions with NULLable columns and I've found a small problem. I'll just post my test script to make it easy for you to see what I mean. BEGIN WORK; CREATE TABLE auction_items ( auctionid INT NOT NULL, category VARCHAR(32) NOT NULL, description VARCHAR(128) NOT NULL, highestbid INT NULL, -- NULL when no bids reserve INT NULL, -- NULL when no reserve started TIMESTAMP NOT NULL, -- When the action opened days INT NOT NULL, -- how many days the auction runs. CHECK(days 0), PRIMARY KEY (auctionid) ); INSERT INTO auction_items VALUES(1,'BIKE','Broken Bicycle',NULL,100,NOW(),10); INSERT INTO auction_items VALUES(2,'BIKE','Mountain Bike',120,NULL,NOW(),10); INSERT INTO auction_items VALUES(3,'BIKE','Racer',230,NULL,NOW(),7); INSERT INTO auction_items VALUES(4,'CAR','Bmw M3',5000,6000,NOW(),7); INSERT INTO auction_items VALUES(5,'CAR','Audi S3',NULL,6500,NOW(),7); INSERT INTO auction_items VALUES(6,'CAR','Holden Kingswood',NULL,2000,NOW(),7); COMMIT; -- The following query gives incorrect results on the -- maxhighbid column SELECT auctionid, category, description, highestbid, reserve, MAX(highestbid) OVER (ORDER BY auctionid) AS maxhighbid, MAX(reserve) OVER() AS highest_reserve FROM auction_items; If you remove the highest_reserve column you get the correct results. The bug also affects MIN, AVG, COUNT, STDDEV but not SUM. I've also had a little look at the documentation included in the patch. At first scan the only thing I can see that is wrong is + +para + Window functions are put in the commandSELECT/command list. + It is forbidden anywhere else such as literalGROUP BY/literal, I think this needs to be rewritten as they are allowed in the ORDER BY clause, works in patch and spec says the same. Maybe it should read something more like: Window functions are only permitted in the commandSELECT/command and the commandORDER BY/command clause of the query. They are forbidden anywhere else... etc. I won't be around until Monday evening (Tuesday morning JST). I'll pickup the review again there. It's really time for me to push on with this review. The patch is getting closer to getting the thumbs up from me. I'm really hoping that can happen on Monday. Then it's over to Heikki for more code feedback. Keep up the good work. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/9 David Rowley [EMAIL PROTECTED]: Hitoshi Harada wrote: I recreate the patch against current HEAD, in the git it's here: http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543 d873bb7228f4c057c23e0 I tested `patch -p1` with the attached and succeeded to make it work cleanly. It seems to me that this patch can be applied until another pg_proc.h entry would introduced in the HEAD. Regards, -- Hitoshi Harada Thanks Hitoshi, You've done what I planned to do first thing this morning. I was planning to write a script that checks for the new lines then I could have removed those, patched then re-added them with the extra f. It looks like it's not easy to keep it working with CVS head. I counted 3 patches that touched pg_proc.h just this month. I must have missed another as I still couldn't get it to apply. Hence the need for the script. Thanks for the new patch I'll get testing now. David. I'm glad to hear that. Actually thanks to git it is quite easy for me to merge my own repository with the HEAD. It tells me which lines are new coming and which lines I modified are newer than else in CVS. This is my first project where I use git (and I am not guru of cvs either -:) but I love it now. So feel free to tell me to recreate a new patch against head. Not so heavy work as making on-the-fly script. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Using one of my original test tables I'm testing windowing functions with a GROUP BY. The following query works as I would expect. -- Works SELECT department, SUM(Salary), ROW_NUMBER() OVER (ORDER BY department), SUM(SUM(salary)) OVER (ORDER BY department) FROM employees GROUP BY department; The following one fails with the message. ERROR: variable not found in subplan target list -- Does not work. SELECT department, SUM(Salary), ROW_NUMBER() OVER (ORDER BY department), SUM(SUM(salary)) OVER (ORDER BY department DESC) FROM employees GROUP BY department; I just added the DESC to force it into creating 2 separate windows. I can re-write the non working query to work using the following: SELECT department, salary, ROW_NUMBER() OVER (ORDER BY department), SUM(salary) OVER (ORDER BY department DESC) FROM (SELECT department, SUM(salary) AS salary FROM employees GROUP BY department ) t; Testing with: create table employees ( id INT primary key, name varchar(30) not null, department varchar(30) not null, salary int not null, check (salary = 0) ); insert into employees values(1,'Jeff','IT',1); insert into employees values(2,'Sam','IT',12000); insert into employees values(3,'Richard','Manager',3); insert into employees values(4,'Ian','Manager',2); insert into employees values(5,'John','IT',6); insert into employees values(6,'Matthew','Director',6); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: I'm glad to hear that. Actually thanks to git it is quite easy for me to merge my own repository with the HEAD. It tells me which lines are new coming and which lines I modified are newer than else in CVS. This is my first project where I use git (and I am not guru of cvs either -:) but I love it now. So feel free to tell me to recreate a new patch against head. Not so heavy work as making on-the-fly script. CUME_DIST and COUNT(*) seem to be working as expected now with the original tests. I'll continue to test more and also test the other windowing functions that I've not got to yet. I did receive a small warning from the compiler when compiling nodeWindow.c. It was complaining about a possible un-initialised f_shrinking. Just to keep it quiet I changed the break's to return's after the elog() call. *** nodeWindow.c2008-11-09 10:54:50.0 + --- nodeWindow.c2008-11-09 10:39:24.0 + *** *** 818,824 case FRAME_CURRENT_RANGE: /* UNSUPPORTED */ elog(ERROR, unknown preceding type %d, node-preceding_type); ! break; case FRAME_VALUE_ROWS: if (node-preceding_rows 0) f_shrinking = 0; --- 818,824 case FRAME_CURRENT_RANGE: /* UNSUPPORTED */ elog(ERROR, unknown preceding type %d, node-preceding_type); ! return; /* keep compiler quiet */ case FRAME_VALUE_ROWS: if (node-preceding_rows 0) f_shrinking = 0; *** *** 922,928 case FRAME_CURRENT_RANGE: /* UNSUPPORTED */ elog(ERROR, unknown preceding type %d, node-preceding_type); ! break; case FRAME_VALUE_ROWS: if (node-preceding_rows = winobj-p_currentpos + 1) f_shrinking = 1; --- 922,928 case FRAME_CURRENT_RANGE: /* UNSUPPORTED */ elog(ERROR, unknown preceding type %d, node-preceding_type); ! return; /* keep compiler quiet */ case FRAME_VALUE_ROWS: if (node-preceding_rows = winobj-p_currentpos + 1) f_shrinking = 1; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada wrote: I recreate the patch against current HEAD, in the git it's here: http://git.postgresql.org/?p=postgresql.git;a=commit;h=f88970d3c6fb9f99543 d873bb7228f4c057c23e0 I tested `patch -p1` with the attached and succeeded to make it work cleanly. It seems to me that this patch can be applied until another pg_proc.h entry would introduced in the HEAD. Regards, -- Hitoshi Harada Thanks Hitoshi, You've done what I planned to do first thing this morning. I was planning to write a script that checks for the new lines then I could have removed those, patched then re-added them with the extra f. It looks like it's not easy to keep it working with CVS head. I counted 3 patches that touched pg_proc.h just this month. I must have missed another as I still couldn't get it to apply. Hence the need for the script. Thanks for the new patch I'll get testing now. David. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Hitoshi Harada Wrote: although attached is the whole (split) patch. I'm having some trouble getting these patches to patch cleanly. I think it's because of this that I can't get postgres to compile after applying the patch. It errors out at tuptoaster.c some constants seem to be missing from fmgroids.h If I open src/include/utils/fmgroids.h The only constaint being defined is: #define F__NULL_ 31 I'd assume it was a problem with my setup, but the CVS head compiles fine. Let me know if I'm doing something wrong. Here is the output from patch and the compile errors. [EMAIL PROTECTED]:~/pg8.4/pgsql$ patch -p1 window_functions.patch.20081107-1 patching file doc/src/sgml/func.sgml Hunk #1 succeeded at 10061 (offset 11 lines). patching file doc/src/sgml/keywords.sgml patching file doc/src/sgml/queries.sgml patching file doc/src/sgml/query.sgml patching file doc/src/sgml/ref/select.sgml patching file doc/src/sgml/syntax.sgml patching file src/backend/catalog/pg_aggregate.c patching file src/backend/catalog/pg_proc.c patching file src/backend/commands/explain.c patching file src/backend/executor/Makefile patching file src/backend/executor/execAmi.c patching file src/backend/executor/execGrouping.c patching file src/backend/executor/execProcnode.c patching file src/backend/executor/execQual.c Hunk #3 succeeded at 4167 (offset 14 lines). patching file src/backend/executor/nodeAgg.c patching file src/backend/executor/nodeWindow.c patching file src/backend/nodes/copyfuncs.c patching file src/backend/nodes/equalfuncs.c patching file src/backend/nodes/nodeFuncs.c patching file src/backend/nodes/outfuncs.c patching file src/backend/nodes/readfuncs.c patching file src/backend/optimizer/path/allpaths.c patching file src/backend/optimizer/path/equivclass.c patching file src/backend/optimizer/plan/createplan.c patching file src/backend/optimizer/plan/planagg.c patching file src/backend/optimizer/plan/planmain.c patching file src/backend/optimizer/plan/planner.c patching file src/backend/optimizer/plan/setrefs.c patching file src/backend/optimizer/plan/subselect.c patching file src/backend/optimizer/prep/prepjointree.c patching file src/backend/optimizer/util/clauses.c patching file src/backend/optimizer/util/tlist.c patching file src/backend/optimizer/util/var.c patching file src/backend/parser/analyze.c patching file src/backend/parser/gram.y Hunk #6 succeeded at 6433 (offset 8 lines). Hunk #7 succeeded at 6443 (offset 8 lines). Hunk #8 succeeded at 8212 (offset 9 lines). Hunk #9 succeeded at 8220 (offset 9 lines). Hunk #10 succeeded at 8232 (offset 9 lines). Hunk #11 succeeded at 8244 (offset 9 lines). Hunk #12 succeeded at 8256 (offset 9 lines). Hunk #13 succeeded at 8272 (offset 9 lines). Hunk #14 succeeded at 8284 (offset 9 lines). Hunk #15 succeeded at 8306 (offset 9 lines). Hunk #16 succeeded at 8754 (offset 9 lines). Hunk #17 succeeded at 9629 (offset 9 lines). Hunk #18 succeeded at 9637 (offset 9 lines). Hunk #19 succeeded at 9703 (offset 9 lines). Hunk #20 succeeded at 9720 (offset 9 lines). Hunk #21 succeeded at 9772 (offset 9 lines). Hunk #22 succeeded at 9959 (offset 9 lines). Hunk #23 succeeded at 9980 (offset 9 lines). patching file src/backend/parser/keywords.c patching file src/backend/parser/parse_agg.c patching file src/backend/parser/parse_clause.c patching file src/backend/parser/parse_expr.c patching file src/backend/parser/parse_func.c patching file src/backend/rewrite/rewriteManip.c patching file src/backend/utils/adt/Makefile patching file src/backend/utils/adt/ruleutils.c patching file src/backend/utils/adt/wfunc.c patching file src/backend/utils/sort/tuplestore.c patching file src/include/catalog/pg_attribute.h patching file src/include/catalog/pg_class.h [EMAIL PROTECTED]:~/pg8.4/pgsql$ patch -p1 window_functions.patch.20081107-2 (Stripping trailing CRs from patch.) patching file src/include/catalog/pg_proc.h Hunk #4 FAILED at 106. 1 out of 4 hunks FAILED -- saving rejects to file src/include/catalog/pg_proc.h.rej (Stripping trailing CRs from patch.) patching file src/include/executor/executor.h (Stripping trailing CRs from patch.) patching file src/include/executor/nodeWindow.h (Stripping trailing CRs from patch.) patching file src/include/nodes/execnodes.h Hunk #3 succeeded at 509 (offset 2 lines). Hunk #4 succeeded at 1586 (offset 2 lines). (Stripping trailing CRs from patch.) patching file src/include/nodes/nodes.h (Stripping trailing CRs from patch.) patching file src/include/nodes/parsenodes.h (Stripping trailing CRs from patch.) patching file src/include/nodes/plannodes.h (Stripping trailing CRs from patch.) patching file src/include/nodes/primnodes.h (Stripping trailing CRs from patch.) patching file src/include/nodes/relation.h (Stripping trailing CRs from patch.) patching file src/include/optimizer/planmain.h (Stripping trailing CRs from patch.) patching file src/include/optimizer/var.h (Stripping trailing CRs from patch.) patching file src/include/parser/parse_agg.h
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/9 David Rowley [EMAIL PROTECTED]: Hitoshi Harada Wrote: although attached is the whole (split) patch. I'm having some trouble getting these patches to patch cleanly. I think it's because of this that I can't get postgres to compile after applying the patch. It errors out at tuptoaster.c some constants seem to be missing from fmgroids.h If I open src/include/utils/fmgroids.h The only constaint being defined is: #define F__NULL_ 31 I'd assume it was a problem with my setup, but the CVS head compiles fine. Let me know if I'm doing something wrong. patching file src/include/catalog/pg_proc.h Hunk #4 FAILED at 106. 1 out of 4 hunks FAILED -- saving rejects to file src/include/catalog/pg_proc.h.rej As it says pg_proc.h may have conflicts. The patch is not against HEAD but based on the same as the previous (v08) patch. I am remote now so I'm not sure but could you try to find conflicts in pg_proc.h? The pg_proc catalog has been added 1 column called prociswfunc (bool) in the patch. All you have to do is add 'f' in the middle of new-coming lines. Sorry for annoying, and I'll be back in hours. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
David Rowley [EMAIL PROTECTED] writes: patching file src/include/catalog/pg_proc.h Hunk #4 FAILED at 106. 1 out of 4 hunks FAILED -- saving rejects to file src/include/catalog/pg_proc.h.rej I imagine you'll find that hunk #4 covers the entire DATA() body of the file :-(. It can't possibly apply cleanly if anyone's added or altered pg_proc entries since the patch was made. What you'd need to do is manually insert proiswfunc = 'f' entries in all the existing DATA lines (this is usually not too hard with sed or an emacs macro), then add whatever new functions the patch defines. Even figuring out the latter from the patch representation can be a serious PITA, since they'll be a few lines out of a multi-thousand-line failed diff hunk. I'm not sure if Hitoshi is in a position to submit the pg_proc changes as two separate diffs --- one to add the new column and a separate one to add in the new functions --- but it'd be a lot easier to deal with merge issues if he could. (Now I'll sit back and wait for some fanboy to claim that $his_favorite_scm could solve this automatically ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Instead of a patch it might be easier to submit the new columns as a perl script or sed command. We do something like that to make merging pg_proc easier. greg On 8 Nov 2008, at 01:36 PM, Tom Lane [EMAIL PROTECTED] wrote: David Rowley [EMAIL PROTECTED] writes: patching file src/include/catalog/pg_proc.h Hunk #4 FAILED at 106. 1 out of 4 hunks FAILED -- saving rejects to file src/include/catalog/pg_proc.h.rej I imagine you'll find that hunk #4 covers the entire DATA() body of the file :-(. It can't possibly apply cleanly if anyone's added or altered pg_proc entries since the patch was made. What you'd need to do is manually insert proiswfunc = 'f' entries in all the existing DATA lines (this is usually not too hard with sed or an emacs macro), then add whatever new functions the patch defines. Even figuring out the latter from the patch representation can be a serious PITA, since they'll be a few lines out of a multi-thousand- line failed diff hunk. I'm not sure if Hitoshi is in a position to submit the pg_proc changes as two separate diffs --- one to add the new column and a separate one to add in the new functions --- but it'd be a lot easier to deal with merge issues if he could. (Now I'll sit back and wait for some fanboy to claim that $his_favorite_scm could solve this automatically ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Quoted from SQL:2008 If CUME_DIST is specified, then the relative *rank *of a row R is defined as NP/NR, where NP is defined to be the number of rows preceding or peer with R in the window ordering of the window partition of R and NR is defined to be the number of rows in the window partition of R. I guess there is a difference between row_number and number of rows preceding or peer with R number of rows preceding or peer with R == count(*) over (order by salary) As far as I understand, the following query should calculate cume_dist properly (and it does so in Oracle): SELECT name,CAST(r AS FLOAT) / c, cd FROM (SELECT name, COUNT(*) OVER(ORDER BY salary) as r, COUNT(*) OVER() AS c, CUME_DIST() OVER(ORDER BY salary) AS cd FROM employees ) t; Sincerely yours, Vladimir Sitnikov
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/5 Vladimir Sitnikov [EMAIL PROTECTED]: Quoted from SQL:2008 If CUME_DIST is specified, then the relative rank of a row R is defined as NP/NR, where NP is defined to be the number of rows preceding or peer with R in the window ordering of the window partition of R and NR is defined to be the number of rows in the window partition of R. I guess there is a difference between row_number and number of rows preceding or peer with R number of rows preceding or peer with R == count(*) over (order by salary) As far as I understand, the following query should calculate cume_dist properly (and it does so in Oracle): SELECT name,CAST(r AS FLOAT) / c, cd FROM (SELECT name, COUNT(*) OVER(ORDER BY salary) as r, COUNT(*) OVER() AS c, CUME_DIST() OVER(ORDER BY salary) AS cd FROM employees ) t; I'm afraid I misinterpreted it. As you say, number of rows preceding == row_number() and rumber of rows preceding or peers to R != row_number() (neither rank()) peers to R in the window function context means same rows by the ORDER BY clause, so in the first example, id=5 and id=6 are peers and in both rows, NP should be 6, as Oracle and Sybase say. Even though I understand the definition, your suggestion of COUNT(*) OVER (ORDER BY salary) doesn't make sense. In the patch, it simply returns the same value as row_number() but is it wrong, too? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
Even though I understand the definition, your suggestion of COUNT(*) OVER (ORDER BY salary) doesn't make sense. Why does not that make sense? I have not read the spec, however Oracle has a default window specification in case there is only an order by clause. The default window is range between unbounded preceding and current row. count(*) over (order by salary range between unbounded preceding and current row) is perfectly identical to the number of rows preceding or peers to R by the definition, isn't it? I see here a word-by-word translation from SQL to the English and vice versa. If the patch returns row_number it is wrong since there is no way for row_number to be a number of rows preceding or peer with R, is there? Regards, Vladimir Sitnikov
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/5 Vladimir Sitnikov [EMAIL PROTECTED]: Even though I understand the definition, your suggestion of COUNT(*) OVER (ORDER BY salary) doesn't make sense. Why does not that make sense? I have not read the spec, however Oracle has a default window specification in case there is only an order by clause. The default window is range between unbounded preceding and current row. count(*) over (order by salary range between unbounded preceding and current row) is perfectly identical to the number of rows preceding or peers to R by the definition, isn't it? I see here a word-by-word translation from SQL to the English and vice versa. If the patch returns row_number it is wrong since there is no way for row_number to be a number of rows preceding or peer with R, is there? I've got it. I had thought that implicit window framing specified by ORDER BY clause (such like above) would mean ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. But actually reading the spec more closely it says: Otherwise, WF consists of all rows of the partition of R that precede R or are peers of R in the window ordering of the window partition defined by the window ordering clause. So it means RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as you pointed. And the result of count(*) OVER (ORDER BY salary) doesn't equal to row_number(). Now my assumption is broken. Let me take time to think about how to solve it... Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers