Re: [HACKERS] Underspecified window queries in regression tests
I have used your notes below to rewrite the Window function SQL manual section. As you said, it was very hard to read. I now understand it better, having restructured it, and I hope others do too. After waiting 30 minutes for our developer doc build to refresh, I am giving up and posting my own URL for the doc changes: http://momjian.us/tmp/pgsql/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS Perhaps I need to go back to having my own doc build. --- On Mon, Oct 17, 2011 at 11:48:38AM +0200, Florian Pflug wrote: On Oct17, 2011, at 01:09 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: ... reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. Hmm. Maybe the use of a footnote there is too subtle, and we should instead have that text in-line (probably in parentheses)? Or we could use a note, but that's probably too much emphasis. Inline and in parentheses sounds fine. In addition, I think we should reword the explanation in 4.2.8 (The SQL Language / SQL Syntax / Value Expressions / Window Functions). Instead of that rather long (and IMHO hard to read) paragraph about possible frame clauses and their behaviour in the presence or absence of an ORDER BY clause, we should go with a more algorithmic explanation I think. Something along these lines maybe: -- .) PARTITION BY splits the rows into disjoint partitions. All further processing happens only inside a single partition .) In RANGE mode, ORDER BY then splits each partition into an ordered list of sub-partitions, each containing rows which the ORDER BY considers to be equivalent. .) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if there are rows which are considered to be equivalent by the ORDER BY, the ordering of the sub-partition isn't fully determined. .) Each row's frame then consists of some consecutive range of sub-partitions. .) In RANGE mode, that consecutive range can only start at either the first sub-partition or the current row's sub-partition, and can only end at either the current row's sub-partition or the last sub-partitions. .) In ROWS mode, the consecutive range may additional start n sub-partitions (or rows, it's the same thing here) before the current row, and may additionally end m sub-partitions/rows after the current row. From that, it follows that even with an underspecified sort order, the contents of each frame are still fully determined in RANGE mode. The ordering of rows within a frame is not determined, though. So overall, in RANGE mode, a query's result is only non-deterministic if the window function is sensitive to the ordering of rows within a frame. In ROWS mode, OTOH, the contents each frame themselves are not fully determined, so even an ordering agnostic window function may produce non-deterministic results. -- If you think that something along these lines would be an improvement, I can try to come up with a patch. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Oct17, 2011, at 01:09 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: ... reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. Hmm. Maybe the use of a footnote there is too subtle, and we should instead have that text in-line (probably in parentheses)? Or we could use a note, but that's probably too much emphasis. Inline and in parentheses sounds fine. In addition, I think we should reword the explanation in 4.2.8 (The SQL Language / SQL Syntax / Value Expressions / Window Functions). Instead of that rather long (and IMHO hard to read) paragraph about possible frame clauses and their behaviour in the presence or absence of an ORDER BY clause, we should go with a more algorithmic explanation I think. Something along these lines maybe: -- .) PARTITION BY splits the rows into disjoint partitions. All further processing happens only inside a single partition .) In RANGE mode, ORDER BY then splits each partition into an ordered list of sub-partitions, each containing rows which the ORDER BY considers to be equivalent. .) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if there are rows which are considered to be equivalent by the ORDER BY, the ordering of the sub-partition isn't fully determined. .) Each row's frame then consists of some consecutive range of sub-partitions. .) In RANGE mode, that consecutive range can only start at either the first sub-partition or the current row's sub-partition, and can only end at either the current row's sub-partition or the last sub-partitions. .) In ROWS mode, the consecutive range may additional start n sub-partitions (or rows, it's the same thing here) before the current row, and may additionally end m sub-partitions/rows after the current row. From that, it follows that even with an underspecified sort order, the contents of each frame are still fully determined in RANGE mode. The ordering of rows within a frame is not determined, though. So overall, in RANGE mode, a query's result is only non-deterministic if the window function is sensitive to the ordering of rows within a frame. In ROWS mode, OTOH, the contents each frame themselves are not fully determined, so even an ordering agnostic window function may produce non-deterministic results. -- If you think that something along these lines would be an improvement, I can try to come up with a patch. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I'm not sure if it applies to this case but I recall I was recently running queries on Oracle that included window functions and it wouldn't even let me run them without ORDER BY clauses in the window definition. I don't know if it cleverly determines that the ORDER BY will change the results or if Oracle just requires ORDER BY on all window definitions or what. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Oct16, 2011, at 20:04 , Greg Stark wrote: On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I'm not sure if it applies to this case but I recall I was recently running queries on Oracle that included window functions and it wouldn't even let me run them without ORDER BY clauses in the window definition. I don't know if it cleverly determines that the ORDER BY will change the results or if Oracle just requires ORDER BY on all window definitions or what. I was about to point out that whether or not ORDER BY is required probably ought to depend on whether the window function acts on the frame, or the whole partition. And that oracle quite likely knows that for the individual window functions while we don't. But, actually, our documentation states in 3.5, Window Functions, that By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. Either I'm confused, or that doesn't match the observed regression test failure. I did a few experiments. Assume that table d contains three rows with v=1, v=2 and v=3. This case seems to work (the frame is always the whole partition, even though the frame_clause says between unbounded preceding and current row): postgres=# select array_agg(v) over (range between unbounded preceding and current row) from d; array_agg --- {1,2,3} {1,2,3} {1,2,3} Once one adds an ORDER BY, the frame_clause is taken into account, so that works too: postgres=# select array_agg(v) over (order by v range between unbounded preceding and current row) from d; array_agg --- {1} {1,2} {1,2,3} But some frame clauses (row 1 preceding, for example) have an effect despite there being no ORDER BY, like here: postgres=# select array_agg(v) over (rows 1 preceding) from d; array_agg --- {1} {1,2} {2,3} ISTM that we probably should ignore frame clauses, unless there's an ORDER BY. Or maybe even throw an error? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
Hitoshi Harada umi.tan...@gmail.com writes: 2011/10/15 Tom Lane t...@sss.pgh.pa.us: I can't recall whether there was some good reason for underspecifying these test queries. It looks like all the problematic ones were added in commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame options supported for window functions, which means it was either me or Hitoshi-san who wrote them that way, but memory is not serving this afternoon. I don't remember if I wrote that part or not, but I like to add explicit ORDER BY to the test cases. It doesn't appear that some reason stopped us to specify it. So +1 for adding the clauses. I looked at this more closely and realized that the reason for doing it like that was to test window frames defined using ROWS rather than RANGE. If we fully specify the window function's input ordering then there's no very interesting distinction between the two, because no rows will have any peers. So adding ORDER BY would in fact reduce the scope of the tests. At this point I'm inclined to leave it alone. Maybe we could think of some other test cases (perhaps using some other function than SUM) which would both exercise the difference between RANGE and ROWS mode, and not be sensitive to the detailed input ordering. But I doubt it's really worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
Florian Pflug f...@phlo.org writes: But some frame clauses (row 1 preceding, for example) have an effect despite there being no ORDER BY, like here: Yeah, why did you expect differently? Without ORDER BY, all rows are peers in the frame ordering, so there's no way for a RANGE spec to select less than the whole partition. But with ROWS, you can select less than that. In general it's not that hard to create nondeterministic window-function queries, since the SQL standard doesn't require you to specify a unique ordering for the window function's input rows. Even in RANGE mode there are plenty of functions that are sensitive to the exact ordering, eg first_value/last_value. I guess the committee felt that locking this down would restrict the feature too much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
On Oct17, 2011, at 00:14 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: But some frame clauses (row 1 preceding, for example) have an effect despite there being no ORDER BY, like here: Yeah, why did you expect differently? Without ORDER BY, all rows are peers in the frame ordering, so there's no way for a RANGE spec to select less than the whole partition. But with ROWS, you can select less than that. I was confused by When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. [3.5, Window Functions]. combined with the part I quoted before, which was By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. [9.19, Window Functions, Last Paragraph] But, reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
Florian Pflug f...@phlo.org writes: ... reading those parts again, I realize the it says When ORDER BY is omitted the *default* frame consists ... , and that the second quote is followed by a footnote which says There are options to define the window frame in other ways, but this tutorial does not cover them. See Section 4.2.8 for details. [3.5, Window Functions] So it was just me being thick. Sorry for the noise. Hmm. Maybe the use of a footnote there is too subtle, and we should instead have that text in-line (probably in parentheses)? Or we could use a note, but that's probably too much emphasis. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
2011/10/17 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: 2011/10/15 Tom Lane t...@sss.pgh.pa.us: I can't recall whether there was some good reason for underspecifying these test queries. It looks like all the problematic ones were added in commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame options supported for window functions, which means it was either me or Hitoshi-san who wrote them that way, but memory is not serving this afternoon. I don't remember if I wrote that part or not, but I like to add explicit ORDER BY to the test cases. It doesn't appear that some reason stopped us to specify it. So +1 for adding the clauses. I looked at this more closely and realized that the reason for doing it like that was to test window frames defined using ROWS rather than RANGE. If we fully specify the window function's input ordering then there's no very interesting distinction between the two, because no rows will have any peers. So adding ORDER BY would in fact reduce the scope of the tests. At this point I'm inclined to leave it alone. Maybe we could think of some other test cases (perhaps using some other function than SUM) which would both exercise the difference between RANGE and ROWS mode, and not be sensitive to the detailed input ordering. But I doubt it's really worth the trouble. Ah, you mentioned about ORDER BY in window specification (OVER clause). I thought it was query's ORDER BY. Yes, it affects in RANGE case, and we don't have rich frame support of RANGE (like n PRECEDING ...) so the case ORDER BY affects result is limited. Agree with leaving it alone. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
2011/10/17 Greg Stark st...@mit.edu: On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I'm not sure if it applies to this case but I recall I was recently running queries on Oracle that included window functions and it wouldn't even let me run them without ORDER BY clauses in the window definition. I don't know if it cleverly determines that the ORDER BY will change the results or if Oracle just requires ORDER BY on all window definitions or what. AFAIK, the current standard doesn't tell clearly if all/some window functions require ORDER BY clause in window specifications. Some window functions like rank and row_number is meaningless if it is omitted, so some implementation doesn't allow it omitted. And I believe Oracle implemented it before the standard, so that'd be why details are different from spec. We designed it per spec and omitting the clause doesn't violate any part of the standard. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Underspecified window queries in regression tests
So I'm testing a patch to make the planner use measured all-visible-page counts for index-only scans. I was expecting to possibly see some plan changes in the regression tests, but this diff scared me: *** *** 906,921 FROM tenk1 WHERE unique1 10; sum | unique1 -+- !4 | 4 !6 | 2 !3 | 1 !7 | 6 ! 15 | 9 ! 17 | 8 ! 13 | 5 !8 | 3 ! 10 | 7 !7 | 0 (10 rows) CREATE TEMP VIEW v_window AS --- 906,921 FROM tenk1 WHERE unique1 10; sum | unique1 -+- !0 | 0 !1 | 1 !3 | 2 !5 | 3 !7 | 4 !9 | 5 ! 11 | 6 ! 13 | 7 ! 15 | 8 ! 17 | 9 (10 rows) CREATE TEMP VIEW v_window AS On inspection, though, there's no bug. The plan did change, and that affected the order in which the rows are fetched, and that changed the window function outputs because this test case is effectively using SUM(x) OVER (ROWS 1 PRECEDING) without any ordering specifier. There are several adjacent tests that are underspecified in the same way, but their results didn't change because they aren't candidates for index-only scans. We could hack around this by adding more columns to the result so that an index-only scan doesn't work. But I wonder whether it wouldn't be smarter to add ORDER BY clauses to the window function calls. I've been known to argue against adding just-in-case ORDER BYs to the regression tests in the past; but these cases bother me more because a plan change will not just rearrange the result rows but change their contents, making it really difficult to verify that nothing's seriously wrong. I can't recall whether there was some good reason for underspecifying these test queries. It looks like all the problematic ones were added in commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame options supported for window functions, which means it was either me or Hitoshi-san who wrote them that way, but memory is not serving this afternoon. Opinions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Underspecified window queries in regression tests
2011/10/15 Tom Lane t...@sss.pgh.pa.us: I can't recall whether there was some good reason for underspecifying these test queries. It looks like all the problematic ones were added in commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame options supported for window functions, which means it was either me or Hitoshi-san who wrote them that way, but memory is not serving this afternoon. I don't remember if I wrote that part or not, but I like to add explicit ORDER BY to the test cases. It doesn't appear that some reason stopped us to specify it. So +1 for adding the clauses. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers