Re: [HACKERS] Underspecified window queries in regression tests

2012-08-16 Thread Bruce Momjian

I have used your notes below to rewrite the Window function SQL manual
section.  As you said, it was very hard to read.  I now understand it
better, having restructured it, and I hope others do too.

After waiting 30 minutes for our developer doc build to refresh, I am
giving up and posting my own URL for the doc changes:

http://momjian.us/tmp/pgsql/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

Perhaps I need to go back to having my own doc build.

---

On Mon, Oct 17, 2011 at 11:48:38AM +0200, Florian Pflug wrote:
 On Oct17, 2011, at 01:09 , Tom Lane wrote:
  Florian Pflug f...@phlo.org writes:
  ... reading those parts again, I realize the it says When ORDER BY is 
  omitted
  the *default* frame consists ... , and that the second quote is followed
  by a footnote which says
  
   There are options to define the window frame in other ways, but this 
  tutorial
   does not cover them. See Section 4.2.8 for details. [3.5, Window 
  Functions]
  
  So it was just me being thick. Sorry for the noise.
  
  Hmm.  Maybe the use of a footnote there is too subtle, and we should
  instead have that text in-line (probably in parentheses)?  Or we could
  use a note, but that's probably too much emphasis.
 
 Inline and in parentheses sounds fine.
 
 In addition, I think we should reword the explanation in 4.2.8 (The SQL 
 Language
 / SQL Syntax / Value Expressions / Window Functions). Instead of that rather
 long (and IMHO hard to read) paragraph about possible frame clauses and their
 behaviour in the presence or absence of an ORDER BY clause, we should go with
 a more algorithmic explanation I think.
 
 Something along these lines maybe:
 
 --
 .) PARTITION BY splits the rows into disjoint partitions. All further 
 processing
happens only inside a single partition
 
 .) In RANGE mode, ORDER BY then splits each partition into an ordered list of
sub-partitions, each containing rows which the ORDER BY considers to be
equivalent.
 
 .) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if
there are rows which are considered to be equivalent by the ORDER BY, the
ordering of the sub-partition isn't fully determined.
 
 .) Each row's frame then consists of some consecutive range of sub-partitions.
 
 .) In RANGE mode, that consecutive range can only start at either the first
sub-partition or the current row's sub-partition, and can only end at 
 either
the current row's sub-partition or the last sub-partitions.
 
 .) In ROWS mode, the consecutive range may additional start n sub-partitions
(or rows, it's the same thing here) before the current row, and may 
 additionally
end m sub-partitions/rows after the current row.
 
 From that, it follows that even with an underspecified sort order, the 
 contents of
 each frame are still fully determined in RANGE mode. The ordering of rows 
 within
 a frame is not determined, though. So overall, in RANGE mode, a query's 
 result is
 only non-deterministic if the window function is sensitive to the ordering of 
 rows
 within a frame.
 
 In ROWS mode, OTOH, the contents each frame themselves are not fully 
 determined,
 so even an ordering agnostic window function may produce non-deterministic 
 results.
 --
 
 If you think that something along these lines would be an improvement, I can 
 try
 to come up with a patch.
 
 best regards,
 Florian Pflug
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-17 Thread Florian Pflug
On Oct17, 2011, at 01:09 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 ... reading those parts again, I realize the it says When ORDER BY is 
 omitted
 the *default* frame consists ... , and that the second quote is followed
 by a footnote which says
 
  There are options to define the window frame in other ways, but this 
 tutorial
  does not cover them. See Section 4.2.8 for details. [3.5, Window Functions]
 
 So it was just me being thick. Sorry for the noise.
 
 Hmm.  Maybe the use of a footnote there is too subtle, and we should
 instead have that text in-line (probably in parentheses)?  Or we could
 use a note, but that's probably too much emphasis.

Inline and in parentheses sounds fine.

In addition, I think we should reword the explanation in 4.2.8 (The SQL Language
/ SQL Syntax / Value Expressions / Window Functions). Instead of that rather
long (and IMHO hard to read) paragraph about possible frame clauses and their
behaviour in the presence or absence of an ORDER BY clause, we should go with
a more algorithmic explanation I think.

Something along these lines maybe:

--
.) PARTITION BY splits the rows into disjoint partitions. All further processing
   happens only inside a single partition

.) In RANGE mode, ORDER BY then splits each partition into an ordered list of
   sub-partitions, each containing rows which the ORDER BY considers to be
   equivalent.

.) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if
   there are rows which are considered to be equivalent by the ORDER BY, the
   ordering of the sub-partition isn't fully determined.

.) Each row's frame then consists of some consecutive range of sub-partitions.

.) In RANGE mode, that consecutive range can only start at either the first
   sub-partition or the current row's sub-partition, and can only end at either
   the current row's sub-partition or the last sub-partitions.

.) In ROWS mode, the consecutive range may additional start n sub-partitions
   (or rows, it's the same thing here) before the current row, and may 
additionally
   end m sub-partitions/rows after the current row.

From that, it follows that even with an underspecified sort order, the 
contents of
each frame are still fully determined in RANGE mode. The ordering of rows within
a frame is not determined, though. So overall, in RANGE mode, a query's result 
is
only non-deterministic if the window function is sensitive to the ordering of 
rows
within a frame.

In ROWS mode, OTOH, the contents each frame themselves are not fully determined,
so even an ordering agnostic window function may produce non-deterministic 
results.
--

If you think that something along these lines would be an improvement, I can try
to come up with a patch.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Greg Stark
On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We could hack around this by adding more columns to the result so that
 an index-only scan doesn't work.  But I wonder whether it wouldn't be
 smarter to add ORDER BY clauses to the window function calls.  I've been
 known to argue against adding just-in-case ORDER BYs to the regression
 tests in the past; but these cases bother me more because a plan change
 will not just rearrange the result rows but change their contents,
 making it really difficult to verify that nothing's seriously wrong.

I'm not sure if it applies to this case but I recall I was recently
running queries on Oracle that included window functions and it
wouldn't even let me run them without ORDER BY clauses in the window
definition. I don't know if it cleverly determines that the ORDER BY
will change the results or if Oracle just requires ORDER BY on all
window definitions or what.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Florian Pflug
On Oct16, 2011, at 20:04 , Greg Stark wrote:
 On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We could hack around this by adding more columns to the result so that
 an index-only scan doesn't work.  But I wonder whether it wouldn't be
 smarter to add ORDER BY clauses to the window function calls.  I've been
 known to argue against adding just-in-case ORDER BYs to the regression
 tests in the past; but these cases bother me more because a plan change
 will not just rearrange the result rows but change their contents,
 making it really difficult to verify that nothing's seriously wrong.
 
 I'm not sure if it applies to this case but I recall I was recently
 running queries on Oracle that included window functions and it
 wouldn't even let me run them without ORDER BY clauses in the window
 definition. I don't know if it cleverly determines that the ORDER BY
 will change the results or if Oracle just requires ORDER BY on all
 window definitions or what.

I was about to point out that whether or not ORDER BY is required probably
ought to depend on whether the window function acts on the frame, or the
whole partition. And that oracle quite likely knows that for the individual
window functions while we don't.

But, actually, our documentation states in 3.5, Window Functions, that

  By default, if ORDER BY is supplied then the frame consists of all rows
  from the start of the partition up through the current row, plus any
  following rows that are equal to the current row according to the ORDER BY
  clause. When ORDER BY is omitted the default frame consists of all rows
  in the partition. 

Either I'm confused, or that doesn't match the observed regression test
failure.

I did a few experiments. Assume that table d contains three rows with
v=1, v=2 and v=3.

This case seems to work (the frame is always the whole partition, even
though the frame_clause says between unbounded preceding and current row):

postgres=# select array_agg(v) over (range between unbounded preceding
 and current row) from d;
 array_agg 
---
 {1,2,3}
 {1,2,3}
 {1,2,3}

Once one adds an ORDER BY, the frame_clause is taken into account, so
that works too:

postgres=# select array_agg(v) over (order by v range between unbounded 
preceding
 and current row) from d;
 array_agg 
---
 {1}
 {1,2}
 {1,2,3}

But some frame clauses (row 1 preceding, for example) have an effect despite
there being no ORDER BY, like here:

postgres=# select array_agg(v) over (rows 1 preceding) from d;
 array_agg 
---
 {1}
 {1,2}
 {2,3}

ISTM that we probably should ignore frame clauses, unless there's an ORDER BY.
Or maybe even throw an error?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Tom Lane
Hitoshi Harada umi.tan...@gmail.com writes:
 2011/10/15 Tom Lane t...@sss.pgh.pa.us:
 I can't recall whether there was some good reason for underspecifying
 these test queries.  It looks like all the problematic ones were added in
 commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame
 options supported for window functions, which means it was either me or
 Hitoshi-san who wrote them that way, but memory is not serving this
 afternoon.

 I don't remember if I wrote that part or not, but I like to add
 explicit ORDER BY to the test cases. It doesn't appear that some
 reason stopped us to specify it. So +1 for adding the clauses.

I looked at this more closely and realized that the reason for doing it
like that was to test window frames defined using ROWS rather than
RANGE.  If we fully specify the window function's input ordering then
there's no very interesting distinction between the two, because no rows
will have any peers.  So adding ORDER BY would in fact reduce the scope
of the tests.

At this point I'm inclined to leave it alone.  Maybe we could think of
some other test cases (perhaps using some other function than SUM) which
would both exercise the difference between RANGE and ROWS mode, and not
be sensitive to the detailed input ordering.  But I doubt it's really
worth the trouble.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 But some frame clauses (row 1 preceding, for example) have an effect despite
 there being no ORDER BY, like here:

Yeah, why did you expect differently?  Without ORDER BY, all rows are
peers in the frame ordering, so there's no way for a RANGE spec to
select less than the whole partition.  But with ROWS, you can select
less than that.

In general it's not that hard to create nondeterministic window-function
queries, since the SQL standard doesn't require you to specify a unique
ordering for the window function's input rows.  Even in RANGE mode there
are plenty of functions that are sensitive to the exact ordering, eg
first_value/last_value.  I guess the committee felt that locking this
down would restrict the feature too much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Florian Pflug
On Oct17, 2011, at 00:14 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 But some frame clauses (row 1 preceding, for example) have an effect despite
 there being no ORDER BY, like here:
 
 Yeah, why did you expect differently?  Without ORDER BY, all rows are
 peers in the frame ordering, so there's no way for a RANGE spec to
 select less than the whole partition.  But with ROWS, you can select
 less than that.

I was confused by

  When an aggregate function is used as a window function, it aggregates over
  the rows within the current row's window frame. [3.5, Window Functions].

combined with the part I quoted before, which was

 By default, if ORDER BY is supplied then the frame consists of all rows
 from the start of the partition up through the current row, plus any
 following rows that are equal to the current row according to the ORDER BY
 clause. When ORDER BY is omitted the default frame consists of all rows
 in the partition. [9.19, Window Functions, Last Paragraph]

But, reading those parts again, I realize the it says When ORDER BY is omitted
the *default* frame consists ... , and that the second quote is followed
by a footnote which says

  There are options to define the window frame in other ways, but this tutorial
  does not cover them. See Section 4.2.8 for details. [3.5, Window Functions]

So it was just me being thick. Sorry for the noise.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 ... reading those parts again, I realize the it says When ORDER BY is omitted
 the *default* frame consists ... , and that the second quote is followed
 by a footnote which says

   There are options to define the window frame in other ways, but this 
 tutorial
   does not cover them. See Section 4.2.8 for details. [3.5, Window Functions]

 So it was just me being thick. Sorry for the noise.

Hmm.  Maybe the use of a footnote there is too subtle, and we should
instead have that text in-line (probably in parentheses)?  Or we could
use a note, but that's probably too much emphasis.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Hitoshi Harada
2011/10/17 Tom Lane t...@sss.pgh.pa.us:
 Hitoshi Harada umi.tan...@gmail.com writes:
 2011/10/15 Tom Lane t...@sss.pgh.pa.us:
 I can't recall whether there was some good reason for underspecifying
 these test queries.  It looks like all the problematic ones were added in
 commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame
 options supported for window functions, which means it was either me or
 Hitoshi-san who wrote them that way, but memory is not serving this
 afternoon.

 I don't remember if I wrote that part or not, but I like to add
 explicit ORDER BY to the test cases. It doesn't appear that some
 reason stopped us to specify it. So +1 for adding the clauses.

 I looked at this more closely and realized that the reason for doing it
 like that was to test window frames defined using ROWS rather than
 RANGE.  If we fully specify the window function's input ordering then
 there's no very interesting distinction between the two, because no rows
 will have any peers.  So adding ORDER BY would in fact reduce the scope
 of the tests.

 At this point I'm inclined to leave it alone.  Maybe we could think of
 some other test cases (perhaps using some other function than SUM) which
 would both exercise the difference between RANGE and ROWS mode, and not
 be sensitive to the detailed input ordering.  But I doubt it's really
 worth the trouble.

Ah, you mentioned about ORDER BY in window specification (OVER
clause). I thought it was query's ORDER BY. Yes, it affects in RANGE
case, and we don't have rich frame support of RANGE (like n PRECEDING
...) so the case ORDER BY affects result is limited. Agree with
leaving it alone.

Regards,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-16 Thread Hitoshi Harada
2011/10/17 Greg Stark st...@mit.edu:
 On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We could hack around this by adding more columns to the result so that
 an index-only scan doesn't work.  But I wonder whether it wouldn't be
 smarter to add ORDER BY clauses to the window function calls.  I've been
 known to argue against adding just-in-case ORDER BYs to the regression
 tests in the past; but these cases bother me more because a plan change
 will not just rearrange the result rows but change their contents,
 making it really difficult to verify that nothing's seriously wrong.

 I'm not sure if it applies to this case but I recall I was recently
 running queries on Oracle that included window functions and it
 wouldn't even let me run them without ORDER BY clauses in the window
 definition. I don't know if it cleverly determines that the ORDER BY
 will change the results or if Oracle just requires ORDER BY on all
 window definitions or what.

AFAIK, the current standard doesn't tell clearly if all/some window
functions require ORDER BY clause in window specifications. Some
window functions like rank and row_number is meaningless if it is
omitted, so some implementation doesn't allow it omitted. And I
believe Oracle implemented it before the standard, so that'd be why
details are different from spec. We designed it per spec and omitting
the clause doesn't violate any part of the standard.

Regards,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Underspecified window queries in regression tests

2011-10-14 Thread Tom Lane
So I'm testing a patch to make the planner use measured all-visible-page
counts for index-only scans.  I was expecting to possibly see some plan
changes in the regression tests, but this diff scared me:

***
*** 906,921 
  FROM tenk1 WHERE unique1  10;
   sum | unique1 
  -+-
!4 |   4
!6 |   2
!3 |   1
!7 |   6
!   15 |   9
!   17 |   8
!   13 |   5
!8 |   3
!   10 |   7
!7 |   0
  (10 rows)
  
  CREATE TEMP VIEW v_window AS
--- 906,921 
  FROM tenk1 WHERE unique1  10;
   sum | unique1 
  -+-
!0 |   0
!1 |   1
!3 |   2
!5 |   3
!7 |   4
!9 |   5
!   11 |   6
!   13 |   7
!   15 |   8
!   17 |   9
  (10 rows)
  
  CREATE TEMP VIEW v_window AS

On inspection, though, there's no bug.  The plan did change, and that
affected the order in which the rows are fetched, and that changed the
window function outputs because this test case is effectively using
SUM(x) OVER (ROWS 1 PRECEDING) without any ordering specifier.
There are several adjacent tests that are underspecified in the same
way, but their results didn't change because they aren't candidates for
index-only scans.

We could hack around this by adding more columns to the result so that
an index-only scan doesn't work.  But I wonder whether it wouldn't be
smarter to add ORDER BY clauses to the window function calls.  I've been
known to argue against adding just-in-case ORDER BYs to the regression
tests in the past; but these cases bother me more because a plan change
will not just rearrange the result rows but change their contents,
making it really difficult to verify that nothing's seriously wrong.

I can't recall whether there was some good reason for underspecifying
these test queries.  It looks like all the problematic ones were added in
commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame
options supported for window functions, which means it was either me or
Hitoshi-san who wrote them that way, but memory is not serving this
afternoon.

Opinions?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Underspecified window queries in regression tests

2011-10-14 Thread Hitoshi Harada
2011/10/15 Tom Lane t...@sss.pgh.pa.us:
 I can't recall whether there was some good reason for underspecifying
 these test queries.  It looks like all the problematic ones were added in
 commit ec4be2ee6827b6bd85e0813c7a8993cfbb0e6fa7 Extend the set of frame
 options supported for window functions, which means it was either me or
 Hitoshi-san who wrote them that way, but memory is not serving this
 afternoon.


I don't remember if I wrote that part or not, but I like to add
explicit ORDER BY to the test cases. It doesn't appear that some
reason stopped us to specify it. So +1 for adding the clauses.

Regards,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers