Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-06-07 Thread Tom Lane
Peter Eisentraut  writes:
> On 2/4/18 13:10, Tom Lane wrote:
>> + 22013EERRCODE_INVALID_PRECEDING_FOLLOWING_SIZE 
>>   invalid_preceding_following_size

> I was checking the new error codes in PostgreSQL 11 and came across
> this.  The original name in the SQL standard is
> INVALID_PRECEDING_OR_FOLLOWING_SIZE_IN_WINDOW_FUNCTION
> which is reasonable to abbreviate, but is there a reason why we lost the
> "or"?

It seemed like a reasonable abbreviation to me.  If you disagree,
feel free to change it.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-06-07 Thread Peter Eisentraut
On 2/4/18 13:10, Tom Lane wrote:
> diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
> index 1475bfe..9871d1e 100644
> *** a/src/backend/utils/errcodes.txt
> --- b/src/backend/utils/errcodes.txt
> *** Section: Class 22 - Data Exception
> *** 177,182 
> --- 177,183 
>   22P06EERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER   
>  nonstandard_use_of_escape_character
>   22010EERRCODE_INVALID_INDICATOR_PARAMETER_VALUE 
>  invalid_indicator_parameter_value
>   22023EERRCODE_INVALID_PARAMETER_VALUE   
>  invalid_parameter_value
> + 22013EERRCODE_INVALID_PRECEDING_FOLLOWING_SIZE  
>  invalid_preceding_following_size
>   2201BEERRCODE_INVALID_REGULAR_EXPRESSION
>  invalid_regular_expression
>   2201WEERRCODE_INVALID_ROW_COUNT_IN_LIMIT_CLAUSE 
>  invalid_row_count_in_limit_clause
>   2201XEERRCODE_INVALID_ROW_COUNT_IN_RESULT_OFFSET_CLAUSE 
>  invalid_row_count_in_result_offset_clause

I was checking the new error codes in PostgreSQL 11 and came across
this.  The original name in the SQL standard is

INVALID_PRECEDING_OR_FOLLOWING_SIZE_IN_WINDOW_FUNCTION

which is reasonable to abbreviate, but is there a reason why we lost the
"or"?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-02-07 Thread Pantelis Theodosiou
On Sun, Feb 4, 2018 at 6:10 PM, Tom Lane  wrote:

> Oliver Ford  writes:
> > [ 0001-window-frame-v13.patch ]
>
> I've been hacking on this all week (with breaks for release notes) and
> have gotten it into a state that I think is close to committable.
>
> There was quite a lot I didn't like about the patch initially, notably
> that the interaction with operator classes/families was done all wrong.
> The idea is to add one support function per opclass, not jam them all
> into one opclass that breaks every rule for B-tree opclasses.  For one
> reason, with this approach there's no chance of dealing with non-default
> sort orders imposed by non-default opclasses.  (As a concrete example,
> suppose that we have two btree opclasses for complex numbers, one that
> sorts by real part and one that sorts by imaginary part.  You can write
> a well-defined in_range function for each of these, but one of them has
> to increment the real part and the other the imaginary part.)  I whacked
> that around and also wrote the missing documentation for the API spec
> for in_range functions.  The path of least resistance was to dump it
> into the nbtree/README text file, which I'm not that satisfied with;
> probably it should go in the main SGML docs, but I did not find a good
> place to put it.
>
> I also really didn't like the implementation you'd chosen in
> nodeWindowAgg.c to scan the entire partition and build an array of peer
> group lengths.  That risks running OOM with a large partition.  Even if
> the array doesn't lead to OOM, the tuplestore will spill to disk with
> nasty performance consequences.  We should try to ensure that the
> tuplestore needn't get larger than the frame, so that well-written queries
> with narrow frames can execute without spilling to disk.  So I rewrote
> that using an idea that had been speculated about in the original
> comments, but nobody had gotten to yet: add some more read pointers to
> track the frame boundaries, and advance them as needed.  I'm not really
> sure if this ends up as more or few row comparisons than the other way,
> but in any case it uses a fixed amount of memory, which is good.
>
> Also, the last patch's reimplementation of WinGetFuncArgInFrame isn't
> right: AFAICS, it results in any "relpos" that would point to a row
> in the exclusion range returning the row just after/before that range,
> which is already wrong if the exclusion range is more than one row,
> plus it doesn't renumber the rows beyond the exclusion.  The behavior
> we want is that the frame rows surviving after exclusion should appear
> consecutively numbered.  (This could be exposed with some tests using
> nth_value.)  I think the attached rewrite gets this right.  Also, punting
> entirely on the set-mark problem for SEEK_TAIL cases doesn't satisfy me,
> for the same reason as above that we don't want the tuplestore to bloat.
> What I did below is to set the mark at the frame start, which at least
> gives an opportunity for efficient queries.
>
> I hacked around on various other things too, for instance the behavior
> for null values in RANGE mode didn't seem to be per spec.
>
> I'm reasonably happy with all the code now, though surely it could use
> another look by someone else.  I've not yet reviewed the docs (other than
> the implementor-oriented details I added), nor have I really looked at the
> test cases.  I do have a couple suggestions on the test cases: for one,
> rather than duplicating the same window definition N times in each query,
> use one WINDOW clause and reference it with "OVER windowname".  Also,
> adding a bunch of columns of different types to a single table seems like
> a messy and not easily extensible way of testing different data types.
> I'd suggest leaving the existing table alone and adding a new test table
> per additional data type you want to test, so that there's an easy
> template for testing future additions of more in_range support.
>
> BTW, something I've not done here but am strongly tempted to do is
> run around and change all the uses of "RANGE value PRECEDING/FOLLOWING"
> terminology to, say, "RANGE offset PRECEDING/FOLLOWING".  "value" is
> just way too generic a term for this situation, making documentation
> confusing, plus you end up contorting sentences to avoid constructions
> like "value of the value".  I'm not wedded to "offset" if somebody's got a
> better word, but let's try to pick something more specific than "value".
> (In the ROWS and GROUPS cases, maybe write "count"?  Not entirely sure
> what to do for text that's trying to address all three cases, though.)
>
>
What about "extent_size" or just "size"? I see the SQL spec refers to
"preceding or following size" in an error message: ("data exception —
invalid preceding or following size in window function" )

Best regards
Pantelis Theodosiou


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-02-06 Thread Tom Lane
I wrote:
> [ 0001-window-frame-v14.patch ]

Pushed after further hacking on the documentation and test cases.
I went ahead with the "value" to "offset" terminology change, too.

You mentioned upthread that you were interested in adding more in_range
support functions.  I think it'd be a great idea to get that done for v11,
because according to my reading of the SQL spec, it expects "RANGE offset
PRECEDING/FOLLOWING" to work for any numeric type.  See SQL:2011 7.11
 syntax rule 11-a-iii:

iii) The declared type of [the sort column] shall be numeric,
datetime, or interval. The declared type of [the offset] shall be
numeric if the declared type of SK is numeric; otherwise, it shall
be an interval type that ...

So we need in_range functions for float4, float8, and numeric if we
really want to claim with a straight face that we cover all of SQL:2011
here.  I think that ought to be a small enough addition to deal with in
the final v11 commitfest, if you have time to prepare a patch this
month.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-02-02 Thread David G. Johnston
On Fri, Feb 2, 2018 at 9:26 AM, Oliver Ford  wrote:

> On Thu, Feb 1, 2018 at 1:46 AM, David G. Johnston
>  wrote:
>
> > The three callers of WinGetFuncArgInFrame don't use the isout argument;
> they
> > probably need to read that and a new isexcluded argument.  Start at the
> > head, loop until isout = true || isexcluded = false.
>
> The patch takes a slightly different approach and puts the logic in
> WinGetFuncArgInFrame.
> The "row_is_in_frame" function now returns a specific return code for
> when an Exclude
> clause was matched.


​I would suggest adding constants for the 4 possible results from
row_is_in_frame.

David J.


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-31 Thread David G. Johnston
On Wed, Jan 31, 2018 at 5:06 PM, Tom Lane  wrote:

> We could imagine reimplementing WinGetFuncArgInFrame to fix this, but
> aside from the sheer inefficiency of simple fixes, I'm not very clear
> what seeking relative to WINDOW_SEEK_CURRENT should mean when the current
> row is excluded.  (Of course, the current row could have been out of frame
> before too.  Maybe we should just get rid of WINDOW_SEEK_CURRENT?)
>
>
​The exclusion clause is frame-specific and none of the three frame callers
use WINDOW_SEEK_CURRENT (only the single partition caller does).  So unless
there is an external code concern removing WINDOW_SEEK_CURRENT from being
valid for WinGetFuncArgInFrame seems straight forward and probably could be
done to remove dead code whether frame exclusion is implemented or not.
And it should remain dead since, as you say, in a frame context the current
row may not be represented even today.

The three ​callers of WinGetFuncArgInFrame don't use the isout argument;
they probably need to read that and a new isexcluded argument.  Start at
the head, loop until isout = true || isexcluded = false.

You could create a partition/frame that retains its contiguous property but
you then need to map multiple original row positions onto the single frame
rows that denote the head and tail positions for each.  This seems
considerably more bug-prone; but I don't really have a feel for how
sheer-ly inefficient the iteration would be (assuming it is even plausible).

I do think moving that decision and code to a separate patch would be a
good separation of work.

The obvious use case for me (I haven't tried hard here) would be something
like the question: compare my value to the average value of the 4 previous
and 4 subsequent records.

Implementing the standard is a plus - though agreed that the implementation
itself makes a difference.  With the iterative approach the complexity
seems manageable and performance paid for only by the user of the feature.

David J.


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-31 Thread Tom Lane
Oliver Ford  writes:
> [ 0001-window-frame-v11.patch ]

I've realized that the exclusion clause aspect of this patch is rather
badly broken.  In particular, the "seek to row" logic in
WinGetFuncArgInFrame is critically dependent on the assumption that the
rows of the frame are contiguous.  Use of an EXCLUDE option makes them
not contiguous, but that doesn't mean you can just return NULL if the
seek hits one of the excluded rows.  The way the spec is written, it's
pretty clear that e.g. first_value() should be the value from the first
row that survives all exclusions.  But as this is coded, if the first
row that'd otherwise be in frame is excluded by EXCLUDE, you'll get
NULL, not the value from the first row that isn't excluded.  An example
of getting the wrong results:

regression=# select x, first_value(x) over (order by x rows between
current row and 1 following exclude current row)
from generate_series(1,10) x;
 x  | first_value 
+-
  1 |
  2 |
  3 |
  4 |
  5 |
  6 |
  7 |
  8 |
  9 |
 10 |
(10 rows)

We could imagine reimplementing WinGetFuncArgInFrame to fix this, but
aside from the sheer inefficiency of simple fixes, I'm not very clear
what seeking relative to WINDOW_SEEK_CURRENT should mean when the current
row is excluded.  (Of course, the current row could have been out of frame
before too.  Maybe we should just get rid of WINDOW_SEEK_CURRENT?)

I'm a bit tempted to rip out the exclusion-clause support and leave the
topic to be revisited later.  It'd have been better done as a separate
patch anyhow IMO, since it seems quite orthogonal to the RANGE or GROUPS
options.  (And TBH, given the lack of field demand for it, I'm not sure
that we want to pay a complexity and performance price for it.)

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-30 Thread Erik Rijkers

Once more trying to attach the  regression.diffs


On 2018-01-30 17:31, Erik Rijkers wrote:

On 2018-01-30 17:08, Oliver Ford wrote:
On Tue, Jan 30, 2018 at 10:48 AM, Oliver Ford  
wrote:


I will send out v10 soon with the desc functions removed and the
EXCLUDE_NO_OTHERS define removed.


Here it is. Exclude No Others is still in the parser, but does
nothing. All desc functions are removed, replaced with a sortByAsc
bool. It no longer changes catversion.


There must be a small difference here but I don't even see it...

Sorry to be bothering you with these tiny things :)

thanks,

Erik Rijkers
*** /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/expected/window.out	2018-01-30 17:13:41.463633724 +0100
--- /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/results/window.out	2018-01-30 17:16:59.607871830 +0100
***
*** 1172,1178 
  (10 rows)
  
  SELECT pg_get_viewdef('v_window');
!  pg_get_viewdef
  ---
SELECT i.i, +
   sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+
--- 1172,1178 
  (10 rows)
  
  SELECT pg_get_viewdef('v_window');
! pg_get_viewdef 
  ---
SELECT i.i, +
   sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+
***
*** 1198,1204 
  (10 rows)
  
  SELECT pg_get_viewdef('v_window');
!   pg_get_viewdef 
  -
SELECT i.i,   +
   sum(i.i) OVER (ORDER BY i.i GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+
--- 1198,1204 
  (10 rows)
  
  SELECT pg_get_viewdef('v_window');
!  pg_get_viewdef  
  -
SELECT i.i,   +
   sum(i.i) OVER (ORDER BY i.i GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+

==



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-30 Thread Erik Rijkers

On 2018-01-30 17:08, Oliver Ford wrote:

On Tue, Jan 30, 2018 at 10:48 AM, Oliver Ford  wrote:


I will send out v10 soon with the desc functions removed and the
EXCLUDE_NO_OTHERS define removed.


Here it is. Exclude No Others is still in the parser, but does
nothing. All desc functions are removed, replaced with a sortByAsc
bool. It no longer changes catversion.


There must be a small difference here but I don't even see it...

Sorry to be bothering you with these tiny things :)

thanks,

Erik Rijkers




Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-30 Thread Oliver Ford
On Tuesday, 30 January 2018, Tom Lane  wrote:

> Another thing I'm a little confused by is the precise API for the in_range
> support functions (the lack of any documentation for it doesn't help).
> I wonder why you chose to provide two support functions per datatype
> combination rather than one with an additional boolean argument.
> In fact, it almost seems like the "end" flag could already do the
> job, though I may be missing something.  As-is, it seems like this
> setup involves a lot of duplicate code and catalog entries ... what
> are we gaining from that?
>
> regards, tom lane
>

We could instead remove the "desc" functions and flip the values of both
"preceding" and "end" for a descending order. It just needs an extra bool
in the parsenode/plannode structs to send to nodeWindowAgg.

I used two functions because it seemed cleaner to me to get the Oid of the
function in the parser for both ordering types, so then nodeWindowAgg
doesn't have to know about sort order (doesn't have to have extra
conditionals for the two). But yes it does increase the catalog and code
size so is probably better removed.

I will send out v10 soon with the desc functions removed and the
EXCLUDE_NO_OTHERS define removed.


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-29 Thread Oliver Ford
On Monday, 29 January 2018, Tom Lane  wrote:

> Oliver Ford  writes:
> > On Monday, 29 January 2018, Tom Lane  wrote:
> >> I've started to go through this in some detail, and I'm wondering why
> >> you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than
> >> just representing that choice as default (0).
>
> > My guess is that it's a little like putting "ORDER BY x ASC" when ASC is
> > usually default behavior - it adds some documentation, perhaps for people
> > new to SQL or to make your intention more explicit. That's the only
> reason
> > I can think of as to why the standards committee included it.
>
> Yeah, they like to do that.  And "ORDER BY x ASC" is actually a precise
> precedent, because we don't print ASC either, cf get_rule_orderby().
>
> regards, tom lane
>

I would strongly suggest taking it out entirely then. There really doesn't
seem a point in adding a new keyword and a new condition in the grammar if
it is going to do absolutely nothing.

If anyone thinks it's useful to have I can just take it out of ruleutils
and remove its define. But personally I would remove it entirely as it's
really just clutter.


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-29 Thread Tom Lane
Oliver Ford  writes:
> On Monday, 29 January 2018, Tom Lane  wrote:
>> I've started to go through this in some detail, and I'm wondering why
>> you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than
>> just representing that choice as default (0).

> My guess is that it's a little like putting "ORDER BY x ASC" when ASC is
> usually default behavior - it adds some documentation, perhaps for people
> new to SQL or to make your intention more explicit. That's the only reason
> I can think of as to why the standards committee included it.

Yeah, they like to do that.  And "ORDER BY x ASC" is actually a precise
precedent, because we don't print ASC either, cf get_rule_orderby().

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-29 Thread Oliver Ford
On Monday, 29 January 2018, Tom Lane  wrote:

> Oliver Ford  writes:
> > [ 0001-window-frame-v9.patch ]
>
> I've started to go through this in some detail, and I'm wondering why
> you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than
> just representing that choice as default (0).  As you have it, a
> window definition that explicitly specifies EXCLUDE NO OTHERS will be
> considered unequal to one that just defaults to that behavior, in
> e.g. transformWindowFuncCall().  That seems undesirable, if not
> outright wrong.  Against that, having the bit allows ruleutils.c
> to print "EXCLUDE NO OTHERS" when the input included that, but that
> seems like a wash if not an anti-feature.  We've never been big on
> making ruleutils output distinguish explicit from implicit selection
> of a default setting, and in this case it could possibly lead to
> outputting a query in a form that is not backwards-compatible to
> older PG versions, when there's no need to be incompatible.


Exclude No Others does seem pretty pointless to me, but it's in the
standard so I included it as an option that can be printed by ruleutils. I
can't imagine it being much used, but if people do want to document that
they are not excluding other rows they can do so.

My guess is that it's a little like putting "ORDER BY x ASC" when ASC is
usually default behavior - it adds some documentation, perhaps for people
new to SQL or to make your intention more explicit. That's the only reason
I can think of as to why the standards committee included it.


> If there's some other consideration I'm missing, please say what;
> otherwise I'll change it.
>
> BTW, I generally recommend not including a catversion change in
> submitted patches.  It causes merge problems any time some other
> catversion-bumping patch gets committed, and it can't possibly be
> the right value for the final commit since you aren't likely to
> be able to predict that date in advance.  It surely doesn't hurt
> to remind the committer that a catversion bump is needed, but just
> do that in the submission message.
>

Ok won't do that again.


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-29 Thread Tom Lane
Oliver Ford  writes:
> [ 0001-window-frame-v9.patch ]

I've started to go through this in some detail, and I'm wondering why
you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than
just representing that choice as default (0).  As you have it, a
window definition that explicitly specifies EXCLUDE NO OTHERS will be
considered unequal to one that just defaults to that behavior, in
e.g. transformWindowFuncCall().  That seems undesirable, if not
outright wrong.  Against that, having the bit allows ruleutils.c
to print "EXCLUDE NO OTHERS" when the input included that, but that
seems like a wash if not an anti-feature.  We've never been big on
making ruleutils output distinguish explicit from implicit selection
of a default setting, and in this case it could possibly lead to
outputting a query in a form that is not backwards-compatible to
older PG versions, when there's no need to be incompatible.

If there's some other consideration I'm missing, please say what;
otherwise I'll change it.

BTW, I generally recommend not including a catversion change in
submitted patches.  It causes merge problems any time some other
catversion-bumping patch gets committed, and it can't possibly be
the right value for the final commit since you aren't likely to
be able to predict that date in advance.  It surely doesn't hurt
to remind the committer that a catversion bump is needed, but just
do that in the submission message.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-27 Thread Tom Lane
Oliver Ford  writes:
> On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkers  wrote:
>> Regression tests only succeed for assert-disabled compiles; they fail when
>> assert-enabled:

> Problem seems to be with an existing Assert in catcache.c:1545:
> Assert(nkeys > 0 && nkeys < cache->cc_nkeys);
> The "<" needs to be "<=" (and is changed in the attached patch).

No, that Assert is correct, because it's in SearchCatCacheList.
It doesn't make any sense to use SearchCatCacheList for a lookup
that specifies all of the key columns, because then you necessarily
have at most one match; you might as well use regular SearchCatCache,
which is significantly more efficient.

> AFAICT this was never a problem before purely because no code before
> this patch called SearchSysCacheList4, so they always called with
> fewer keys than the number available.

Hm, probably we shouldn't even have that macro.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-27 Thread Erik Rijkers

On 2018-01-27 11:49, Oliver Ford wrote:

On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkers  wrote:

On 2018-01-27 00:35, Oliver Ford wrote:


Attached patch implements an extensible version of the RANGE with
values clause. It doesn't actually add any more type support than was


[...]


I've tested that the existing regression tests in previous versions
still pass, and also added new tests for descending mode.



Hi,

Regression tests only succeed for assert-disabled compiles; they fail 
when

assert-enabled:

I used (Centos 6.9):


Could you please try the attached version? It works for me with asserts 
enabled.




[0001-window-frame-v8.patch]

Yes, that fixed it, thanks.


Problem seems to be with an existing Assert in catcache.c:1545:

Assert(nkeys > 0 && nkeys < cache->cc_nkeys);

The "<" needs to be "<=" (and is changed in the attached patch).
AFAICT this was never a problem before purely because no code before
this patch called SearchSysCacheList4, so they always called with
fewer keys than the number available. But it's surely correct to
assert that the number of keys supplied is less than or equal to, not
less than, the number of keys in the cache.




Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-09 Thread Tom Lane
Oliver Ford  writes:
> On Tuesday, 9 January 2018, Tom Lane  wrote:
>> So the approach I'm imagining now is a datatype-specific support function
>> along the lines of
>>  in_range(a, b, delta) returns bool
>> Likely there are two of these, one each for the PRECEDING and FOLLOWING
>> cases.

> Would you prefer two functions, or a single function with a parameter for
> PRECEDING/FOLLOWING? Maybe:
>   in_range(a, b, delta, following) returns bool

You could do it that way too.  The two-function approach seems a little
cleaner and easier to document IMO, but it would create more catalog
bloat, so there's that.  I don't have a strong preference.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-09 Thread Oliver Ford
On Tuesday, 9 January 2018, Tom Lane  wrote:
>
> So the approach I'm imagining now is a datatype-specific support function
> along the lines of
>
> in_range(a, b, delta) returns bool
>
> which is supposed to return true if a <= b + delta, or something along
> that line --- exact details of the definition TBD --- with the proviso
> that if b + delta would overflow then the result is automatically true.
>
> We could probably also delegate the requirement of throwing an error
> for negative delta to this function, eliminating the need for the
> datatype-independent core code to know how to tell that, which is the
> other datatype-dependent behavior needed per spec.
>
> Likely there are two of these, one each for the PRECEDING and FOLLOWING
> cases.
>
>
>
Would you prefer two functions, or a single function with a parameter for
PRECEDING/FOLLOWING? Maybe:

  in_range(a, b, delta, following) returns bool

Where following is a bool which is true if FOLLOWING was specified and
false if PRECEDING was specified?


Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-09 Thread Tom Lane
Vik Fearing  writes:
> I'm -1 on such a patch, even though I would really like this feature.

For the record, I'd really like to get this feature in too (and am
willing to help) ... but it needs to be done right.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-09 Thread Vik Fearing
On 01/09/2018 10:59 PM, Tom Lane wrote:
> Generally speaking, Postgres tries hard to be an extensible-datatype
> system, going beyond the SQL standard's minimum requirements when
> necessary to make it so.  The reason that we don't already have RANGE
> PRECEDING/FOLLOWING support is that nobody was satisfied with only
> making it work for integers and datetimes.  There was, as I recall, code
> implementing more or less what you've got here in the original window
> function submission, and we pulled it out before committing because of
> that inadequacy.  I don't think the fact that some years have gone by
> means that we should forget about keeping the feature extensible.

I'm glad I read the thread before I replied.  My biggest complaint I had
in my head when reading the initial post was that clamping down on
specific datatypes was distinctly non-PostgreSQL-esque.

I'm -1 on such a patch, even though I would really like this feature.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-09 Thread Tom Lane
Oliver Ford  writes:
> [ 0001-window-frame-v6.patch ]

Generally speaking, Postgres tries hard to be an extensible-datatype
system, going beyond the SQL standard's minimum requirements when
necessary to make it so.  The reason that we don't already have RANGE
PRECEDING/FOLLOWING support is that nobody was satisfied with only
making it work for integers and datetimes.  There was, as I recall, code
implementing more or less what you've got here in the original window
function submission, and we pulled it out before committing because of
that inadequacy.  I don't think the fact that some years have gone by
means that we should forget about keeping the feature extensible.

One subsequent discussion about how we might make it work to project
standards was here:

https://www.postgresql.org/message-id/flat/51C3B952.60907%402ndquadrant.com

Looking back at that, I notice that we all focused on the way to identify
a suitable "+" or "-" operator, but now I'm thinking that that's not
actually a good factorization, because it'd be subject to undesirable
overflow hazards.  That is, if we have an integer sequence like

2147483640
2147483641
2147483642
2147483643
2147483644

and we operate on this with "RANGE FOLLOWING 10", that approach results
in an integer overflow when we try to calculate the limit values.  But
there's no real need for an overflow error.  Ideally, if we try to form
2147483640 + 10 and notice it's overflowed, we'd treat the bound as
+infinity, because every non-overflowed integer value must be within range.

So the approach I'm imagining now is a datatype-specific support function
along the lines of

in_range(a, b, delta) returns bool

which is supposed to return true if a <= b + delta, or something along
that line --- exact details of the definition TBD --- with the proviso
that if b + delta would overflow then the result is automatically true.

We could probably also delegate the requirement of throwing an error
for negative delta to this function, eliminating the need for the
datatype-independent core code to know how to tell that, which is the
other datatype-dependent behavior needed per spec.

Likely there are two of these, one each for the PRECEDING and FOLLOWING
cases.

As suggested in the above-mentioned thread, we could attach such functions
as support functions in the btree opclass that defines the sort order of
the window frame's ordering column, and the core code could look it up
from there.  Extensibility would come from the fact that people can define
new opclasses.  Also, I believe we could support multiple such functions
per opclass, allowing the potential to support "delta"s of different
datatypes --- pg_amproc.amproclefttype would correspond to the common
type of a and b, while pg_amproc.amprocrighttype would correspond to
the data type of delta.  We certainly need to allow delta to be a
different type from a/b just to handle the spec's timestamp cases.
I'm not sure if there's near-term value in multiple types of delta
values, but it seems easy to allow in this framework.

regards, tom lane



Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-28 Thread David Fetter
On Tue, Nov 28, 2017 at 10:51:19AM +, Oliver Ford wrote:
> On Tue, Nov 28, 2017 at 4:38 AM, David Fetter  wrote:
> > I've taken the liberty of adding float8, somewhat mechanically.  Do
> > the docs need some change, assuming that addition is useful?
> >
> > Best,
> > David.
> > --
> > David Fetter  http://fetter.org/
> > Phone: +1 415 235 3778
> 
> The SQL:2011 standard says that the range values should only be an
> integer or interval. My understanding is therefore that the ORDER BY
> columns should only be either an integer, with integer range values -
> or a date/time, with interval range values.
> 
> I think if we go outside the standard we should leave it for another
> patch and further discussion. But maybe others would prefer to add
> support for more types even if they are non-standard?

I confess I was thinking more in terms of the use cases I recall from
back when I was training to be a scientist than the restrictions the
standard imposed.

Hapoy to make this extension separate if that's the consensus.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-28 Thread Oliver Ford
On Tue, Nov 28, 2017 at 4:38 AM, David Fetter  wrote:
> On Mon, Nov 27, 2017 at 04:55:17PM +, Oliver Ford wrote:
>> On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkers  wrote:
>> > On 2017-11-27 17:34, Erik Rijkers wrote:
>> >>
>> >> On 2017-11-27 16:01, Oliver Ford wrote:
>> >>>
>> >>> Attached is it in bare diff form.
>> >>
>> >>
>> >> [0001-window-frame-v3.patch]
>> >>
>> >> Thanks, that did indeed fix it:
>> >>
>> >> make && make check  now  ok.
>> >>
>> >> There were errors in the doc build (unmatched tags); I fixed them in
>> >> the attached doc-patch (which should go on top of yours).
>> >
>> >
>> > 0001-window-frame-v3-fixtags.diff
>> >
>> > now attached, I hope...
>> >
>>
>> Cheers here's v4 with the correct docs.
>
> I've taken the liberty of adding float8, somewhat mechanically.  Do
> the docs need some change, assuming that addition is useful?
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778

The SQL:2011 standard says that the range values should only be an
integer or interval. My understanding is therefore that the ORDER BY
columns should only be either an integer, with integer range values -
or a date/time, with interval range values.

I think if we go outside the standard we should leave it for another
patch and further discussion. But maybe others would prefer to add
support for more types even if they are non-standard?



Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-27 Thread Oliver Ford
On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkers  wrote:
> On 2017-11-27 17:34, Erik Rijkers wrote:
>>
>> On 2017-11-27 16:01, Oliver Ford wrote:
>>>
>>> Attached is it in bare diff form.
>>
>>
>> [0001-window-frame-v3.patch]
>>
>> Thanks, that did indeed fix it:
>>
>> make && make check  now  ok.
>>
>> There were errors in the doc build (unmatched tags); I fixed them in
>> the attached doc-patch (which should go on top of yours).
>
>
> 0001-window-frame-v3-fixtags.diff
>
> now attached, I hope...
>

Cheers here's v4 with the correct docs.


0001-window-frame-v4.patch
Description: Binary data


Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-27 Thread Erik Rijkers

On 2017-11-27 17:34, Erik Rijkers wrote:

On 2017-11-27 16:01, Oliver Ford wrote:

Attached is it in bare diff form.


[0001-window-frame-v3.patch]

Thanks, that did indeed fix it:

make && make check  now  ok.

There were errors in the doc build (unmatched tags); I fixed them in
the attached doc-patch (which should go on top of yours).


0001-window-frame-v3-fixtags.diff

now attached, I hope...

--- doc/src/sgml/syntax.sgml.orig	2017-11-27 17:19:30.253810944 +0100
+++ doc/src/sgml/syntax.sgml	2017-11-27 17:20:55.515626931 +0100
@@ -1805,8 +1805,8 @@
 and the optional frame_clause
 can be one of
 
-{ RANGE | ROWS } frame_start [ frame_exclusion_clause ]
-{ RANGE | ROWS } BETWEEN frame_start AND frame_end [ frame_exclusion_clause ]
+{ RANGE | ROWS } frame_start [ frame_exclusion_clause ]
+{ RANGE | ROWS } BETWEEN frame_start AND frame_end [ frame_exclusion_clause ]
 
 where frame_start and frame_end can be
 one of
@@ -1817,7 +1817,7 @@
 value FOLLOWING
 UNBOUNDED FOLLOWING
 
-where the optional frame_exclusion_clause can be one of
+where the optional frame_exclusion_clause can be one of
 
 EXCLUDE CURRENT ROW
 EXCLUDE TIES
@@ -1888,24 +1888,24 @@
 

 The value PRECEDING and
-value FOLLOWING cases, when used
-in ROWS mode, indicate that the frame starts or ends the specified
-number of rows before or after the current row. In ROWS mode,
-value must be an integer expression not containing any variables,
+value FOLLOWING cases, when used
+in ROWS mode, indicate that the frame starts or ends the specified
+number of rows before or after the current row. In ROWS mode,
+value must be an integer expression not containing any variables,
 aggregate functions, or window functions.
-When used in RANGE mode, they indicate that the frame starts or ends when the value of
-each row's ORDER BY column is within the start value and end value bounds. In RANGE mode,
-value can be either an integer expression or a date/time interval.
-In RANGE mode, there must be exactly one ORDER BY column and if the column is an integer column,
-then value must be an integer.
-If it is a date/time column, then value must be an interval. In both modes,
+When used in RANGE mode, they indicate that the frame starts or ends when the value of
+each row's ORDER BY column is within the start value and end value bounds. In RANGE mode,
+value can be either an integer expression or a date/time interval.
+In RANGE mode, there must be exactly one ORDER BY column and if the column is an integer column,
+then value must be an integer.
+If it is a date/time column, then value must be an interval. In both modes,
 the value must not be null or negative; but it can be zero, which just selects the current row.

 

-For the frame_exclusion_clause, EXCLUDE CURRENT ROW
-excludes the current row from the frame. EXCLUDE TIES excludes any peers of the current row from the
-frame. EXCLUDE NO OTHERS does nothing, but is provided in order to optionally document the intention
+For the frame_exclusion_clause, EXCLUDE CURRENT ROW
+excludes the current row from the frame. EXCLUDE TIES excludes any peers of the current row from the
+frame. EXCLUDE NO OTHERS does nothing, but is provided in order to optionally document the intention
 not to exclude any other rows.

 


Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-27 Thread Erik Rijkers

On 2017-11-27 16:01, Oliver Ford wrote:

Attached is it in bare diff form.


[0001-window-frame-v3.patch]

Thanks, that did indeed fix it:

make && make check  now  ok.

There were errors in the doc build (unmatched tags); I fixed them in the 
attached doc-patch (which should go on top of yours).


(In very limited testing I did not find any problems yet)


thanks,

Erik Rijkers





Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-27 Thread Oliver Ford
On Mon, Nov 27, 2017 at 12:06 PM, Oliver Ford  wrote:
> On Fri, Nov 24, 2017 at 3:08 PM, Erikjan Rijkers  wrote:
>>   SELECT pg_get_viewdef('v_window');
>> ! pg_get_viewdef
>> ! --
>> !   SELECT i.i,+
>> !  sum(i.i) OVER (ORDER BY i.i) AS sum_rows+
>>   FROM generate_series(1, 10) i(i);
>>   (1 row)
>>
>> --- 1034,1043 
>>   (10 rows)
>>
>>   SELECT pg_get_viewdef('v_window');
>> ! pg_get_viewdef
>> !
>> ---
>> !   SELECT i.i,
>> +
>> !  sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
>> AS sum_rows+
>>   FROM generate_series(1, 10) i(i);
>>   (1 row)
>>
>>
>> This small hickup didn't prevent building an instance but obviously I
>> haven't done any real tests yet.
>>
>>
>> thanks,
>>
>>
>> Erik Rijkers
>
> After another clone and rebuild it works alright with the correct
> spacing on mine, so the attached v2 should all pass. I noticed that I
> hadn't added the exclusions clauses to the view defs code, so that's
> also in this patch with extra tests to cover it.

Sorry previous patch was in full-commit form and not just a diff.
Attached is it in bare diff form.


0001-window-frame-v3.patch
Description: Binary data


Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-24 Thread Craig Ringer
On 24 November 2017 at 22:11, Oliver Ford  wrote:

> Adds RANGE BETWEEN with a start and end value, as well as an
> exclusions clause, to the window functions. This partially resolves
> TODO list item "Implement full support for window framing clauses".
>

Yay!

I'll try to take a look at this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Add RANGE with values and exclusions clauses to the Window Functions

2017-11-24 Thread Erikjan Rijkers

On 2017-11-24 15:11, Oliver Ford wrote:

Adds RANGE BETWEEN with a start and end value, as well as an
exclusions clause, to the window functions. This partially resolves
TODO list item "Implement full support for window framing clauses".


[0001-window-frame-v1.patch]

(debian 8)

make check fails:

 foreign_data ... ok
 window   ... FAILED
 xmlmap   ... ok

The diff is:

$ ( cd  
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress && cat 
regression.diffs )
*** 
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/expected/window.out 
  2017-11-24 15:36:15.387573714 +0100
--- 
/var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/results/window.out 
   2017-11-24 15:38:35.290553157 +0100

***
*** 1034,1043 
  (10 rows)

  SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
! --
!   SELECT i.i,+
!  sum(i.i) OVER (ORDER BY i.i) AS sum_rows+
  FROM generate_series(1, 10) i(i);
  (1 row)

--- 1034,1043 
  (10 rows)

  SELECT pg_get_viewdef('v_window');
! pg_get_viewdef
! 
---
!   SELECT i.i,  
   +
!  sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 
FOLLOWING) AS sum_rows+

  FROM generate_series(1, 10) i(i);
  (1 row)


This small hickup didn't prevent building an instance but obviously I 
haven't done any real tests yet.



thanks,


Erik Rijkers



Add RANGE with values and exclusions clauses to the Window Functions

2017-11-24 Thread Oliver Ford
Adds RANGE BETWEEN with a start and end value, as well as an
exclusions clause, to the window functions. This partially resolves
TODO list item "Implement full support for window framing clauses".

== Specification ==

The window functions already allow a "ROWS BETWEEN start_value
PRECEDING/FOLLOWING AND end_value PRECEDING/FOLLOWING" to restrict the
number of rows within a partition that are piped into an aggregate
function based on their position before or after the current row. This
patch adds an equivalent for RANGE which restricts the rows based on
whether the _values_ of the ORDER BY column for all other rows in the
partition are within the start_value and end_value bounds. This brings
PostgreSQL to parity with Oracle, and implements a SQL:2011 standard
feature.

SQL:2011 also defines a window frame exclusion clause, which excludes
certain rows from the result. This clause doesn't seem to be
implemented in any mainstream RDBMS (MariaDb mentions that fact in its
documentation here:
https://mariadb.com/kb/en/library/window-functions-overview/ and has
it on its TODO list). This patch implements three EXCLUDE clauses
described in the standard:

EXCLUDE CURRENT ROW - excludes the current row from the result
EXCLUDE TIES - excludes identical rows from the result
EXCLUDE NO OTHERS - does nothing, is the default behavior; exists
purely to describe the intention not to exclude any other rows

The RANGE BETWEEN clause requires a single ORDER BY column which must
be either an integer or a date/time type. If the column is a date/time
type then start_value and end_value must both be an interval type. If
the column is an integer, then the values must both be integers.

== Testing ==

Tested on Windows with MinGW. All existing regression tests pass. New
tests and updated documentation is included. Tests show both the new
RANGE with values working and the exclusion clause working in both
RANGE and ROWS mode.

== Future Work ==

The standard also defines, in addition to RANGE and ROWS, a GROUPS
option with a corresponding EXCLUDE GROUP option. This also doesn't
seem to be implemented anywhere else, and I plan to implement it next.

This patch also adds some new error messages which have not been
internationalized.


0001-window-frame-v1.patch
Description: Binary data