Re: Add RANGE with values and exclusions clauses to the Window Functions
Peter Eisentraut writes: > On 2/4/18 13:10, Tom Lane wrote: >> + 22013EERRCODE_INVALID_PRECEDING_FOLLOWING_SIZE >> invalid_preceding_following_size > I was checking the new error codes in PostgreSQL 11 and came across > this. The original name in the SQL standard is > INVALID_PRECEDING_OR_FOLLOWING_SIZE_IN_WINDOW_FUNCTION > which is reasonable to abbreviate, but is there a reason why we lost the > "or"? It seemed like a reasonable abbreviation to me. If you disagree, feel free to change it. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2/4/18 13:10, Tom Lane wrote: > diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt > index 1475bfe..9871d1e 100644 > *** a/src/backend/utils/errcodes.txt > --- b/src/backend/utils/errcodes.txt > *** Section: Class 22 - Data Exception > *** 177,182 > --- 177,183 > 22P06EERRCODE_NONSTANDARD_USE_OF_ESCAPE_CHARACTER > nonstandard_use_of_escape_character > 22010EERRCODE_INVALID_INDICATOR_PARAMETER_VALUE > invalid_indicator_parameter_value > 22023EERRCODE_INVALID_PARAMETER_VALUE > invalid_parameter_value > + 22013EERRCODE_INVALID_PRECEDING_FOLLOWING_SIZE > invalid_preceding_following_size > 2201BEERRCODE_INVALID_REGULAR_EXPRESSION > invalid_regular_expression > 2201WEERRCODE_INVALID_ROW_COUNT_IN_LIMIT_CLAUSE > invalid_row_count_in_limit_clause > 2201XEERRCODE_INVALID_ROW_COUNT_IN_RESULT_OFFSET_CLAUSE > invalid_row_count_in_result_offset_clause I was checking the new error codes in PostgreSQL 11 and came across this. The original name in the SQL standard is INVALID_PRECEDING_OR_FOLLOWING_SIZE_IN_WINDOW_FUNCTION which is reasonable to abbreviate, but is there a reason why we lost the "or"? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Sun, Feb 4, 2018 at 6:10 PM, Tom Lanewrote: > Oliver Ford writes: > > [ 0001-window-frame-v13.patch ] > > I've been hacking on this all week (with breaks for release notes) and > have gotten it into a state that I think is close to committable. > > There was quite a lot I didn't like about the patch initially, notably > that the interaction with operator classes/families was done all wrong. > The idea is to add one support function per opclass, not jam them all > into one opclass that breaks every rule for B-tree opclasses. For one > reason, with this approach there's no chance of dealing with non-default > sort orders imposed by non-default opclasses. (As a concrete example, > suppose that we have two btree opclasses for complex numbers, one that > sorts by real part and one that sorts by imaginary part. You can write > a well-defined in_range function for each of these, but one of them has > to increment the real part and the other the imaginary part.) I whacked > that around and also wrote the missing documentation for the API spec > for in_range functions. The path of least resistance was to dump it > into the nbtree/README text file, which I'm not that satisfied with; > probably it should go in the main SGML docs, but I did not find a good > place to put it. > > I also really didn't like the implementation you'd chosen in > nodeWindowAgg.c to scan the entire partition and build an array of peer > group lengths. That risks running OOM with a large partition. Even if > the array doesn't lead to OOM, the tuplestore will spill to disk with > nasty performance consequences. We should try to ensure that the > tuplestore needn't get larger than the frame, so that well-written queries > with narrow frames can execute without spilling to disk. So I rewrote > that using an idea that had been speculated about in the original > comments, but nobody had gotten to yet: add some more read pointers to > track the frame boundaries, and advance them as needed. I'm not really > sure if this ends up as more or few row comparisons than the other way, > but in any case it uses a fixed amount of memory, which is good. > > Also, the last patch's reimplementation of WinGetFuncArgInFrame isn't > right: AFAICS, it results in any "relpos" that would point to a row > in the exclusion range returning the row just after/before that range, > which is already wrong if the exclusion range is more than one row, > plus it doesn't renumber the rows beyond the exclusion. The behavior > we want is that the frame rows surviving after exclusion should appear > consecutively numbered. (This could be exposed with some tests using > nth_value.) I think the attached rewrite gets this right. Also, punting > entirely on the set-mark problem for SEEK_TAIL cases doesn't satisfy me, > for the same reason as above that we don't want the tuplestore to bloat. > What I did below is to set the mark at the frame start, which at least > gives an opportunity for efficient queries. > > I hacked around on various other things too, for instance the behavior > for null values in RANGE mode didn't seem to be per spec. > > I'm reasonably happy with all the code now, though surely it could use > another look by someone else. I've not yet reviewed the docs (other than > the implementor-oriented details I added), nor have I really looked at the > test cases. I do have a couple suggestions on the test cases: for one, > rather than duplicating the same window definition N times in each query, > use one WINDOW clause and reference it with "OVER windowname". Also, > adding a bunch of columns of different types to a single table seems like > a messy and not easily extensible way of testing different data types. > I'd suggest leaving the existing table alone and adding a new test table > per additional data type you want to test, so that there's an easy > template for testing future additions of more in_range support. > > BTW, something I've not done here but am strongly tempted to do is > run around and change all the uses of "RANGE value PRECEDING/FOLLOWING" > terminology to, say, "RANGE offset PRECEDING/FOLLOWING". "value" is > just way too generic a term for this situation, making documentation > confusing, plus you end up contorting sentences to avoid constructions > like "value of the value". I'm not wedded to "offset" if somebody's got a > better word, but let's try to pick something more specific than "value". > (In the ROWS and GROUPS cases, maybe write "count"? Not entirely sure > what to do for text that's trying to address all three cases, though.) > > What about "extent_size" or just "size"? I see the SQL spec refers to "preceding or following size" in an error message: ("data exception — invalid preceding or following size in window function" ) Best regards Pantelis Theodosiou
Re: Add RANGE with values and exclusions clauses to the Window Functions
I wrote: > [ 0001-window-frame-v14.patch ] Pushed after further hacking on the documentation and test cases. I went ahead with the "value" to "offset" terminology change, too. You mentioned upthread that you were interested in adding more in_range support functions. I think it'd be a great idea to get that done for v11, because according to my reading of the SQL spec, it expects "RANGE offset PRECEDING/FOLLOWING" to work for any numeric type. See SQL:2011 7.11 syntax rule 11-a-iii: iii) The declared type of [the sort column] shall be numeric, datetime, or interval. The declared type of [the offset] shall be numeric if the declared type of SK is numeric; otherwise, it shall be an interval type that ... So we need in_range functions for float4, float8, and numeric if we really want to claim with a straight face that we cover all of SQL:2011 here. I think that ought to be a small enough addition to deal with in the final v11 commitfest, if you have time to prepare a patch this month. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Fri, Feb 2, 2018 at 9:26 AM, Oliver Fordwrote: > On Thu, Feb 1, 2018 at 1:46 AM, David G. Johnston > wrote: > > > The three callers of WinGetFuncArgInFrame don't use the isout argument; > they > > probably need to read that and a new isexcluded argument. Start at the > > head, loop until isout = true || isexcluded = false. > > The patch takes a slightly different approach and puts the logic in > WinGetFuncArgInFrame. > The "row_is_in_frame" function now returns a specific return code for > when an Exclude > clause was matched. I would suggest adding constants for the 4 possible results from row_is_in_frame. David J.
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Wed, Jan 31, 2018 at 5:06 PM, Tom Lanewrote: > We could imagine reimplementing WinGetFuncArgInFrame to fix this, but > aside from the sheer inefficiency of simple fixes, I'm not very clear > what seeking relative to WINDOW_SEEK_CURRENT should mean when the current > row is excluded. (Of course, the current row could have been out of frame > before too. Maybe we should just get rid of WINDOW_SEEK_CURRENT?) > > The exclusion clause is frame-specific and none of the three frame callers use WINDOW_SEEK_CURRENT (only the single partition caller does). So unless there is an external code concern removing WINDOW_SEEK_CURRENT from being valid for WinGetFuncArgInFrame seems straight forward and probably could be done to remove dead code whether frame exclusion is implemented or not. And it should remain dead since, as you say, in a frame context the current row may not be represented even today. The three callers of WinGetFuncArgInFrame don't use the isout argument; they probably need to read that and a new isexcluded argument. Start at the head, loop until isout = true || isexcluded = false. You could create a partition/frame that retains its contiguous property but you then need to map multiple original row positions onto the single frame rows that denote the head and tail positions for each. This seems considerably more bug-prone; but I don't really have a feel for how sheer-ly inefficient the iteration would be (assuming it is even plausible). I do think moving that decision and code to a separate patch would be a good separation of work. The obvious use case for me (I haven't tried hard here) would be something like the question: compare my value to the average value of the 4 previous and 4 subsequent records. Implementing the standard is a plus - though agreed that the implementation itself makes a difference. With the iterative approach the complexity seems manageable and performance paid for only by the user of the feature. David J.
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > [ 0001-window-frame-v11.patch ] I've realized that the exclusion clause aspect of this patch is rather badly broken. In particular, the "seek to row" logic in WinGetFuncArgInFrame is critically dependent on the assumption that the rows of the frame are contiguous. Use of an EXCLUDE option makes them not contiguous, but that doesn't mean you can just return NULL if the seek hits one of the excluded rows. The way the spec is written, it's pretty clear that e.g. first_value() should be the value from the first row that survives all exclusions. But as this is coded, if the first row that'd otherwise be in frame is excluded by EXCLUDE, you'll get NULL, not the value from the first row that isn't excluded. An example of getting the wrong results: regression=# select x, first_value(x) over (order by x rows between current row and 1 following exclude current row) from generate_series(1,10) x; x | first_value +- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | (10 rows) We could imagine reimplementing WinGetFuncArgInFrame to fix this, but aside from the sheer inefficiency of simple fixes, I'm not very clear what seeking relative to WINDOW_SEEK_CURRENT should mean when the current row is excluded. (Of course, the current row could have been out of frame before too. Maybe we should just get rid of WINDOW_SEEK_CURRENT?) I'm a bit tempted to rip out the exclusion-clause support and leave the topic to be revisited later. It'd have been better done as a separate patch anyhow IMO, since it seems quite orthogonal to the RANGE or GROUPS options. (And TBH, given the lack of field demand for it, I'm not sure that we want to pay a complexity and performance price for it.) regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
Once more trying to attach the regression.diffs On 2018-01-30 17:31, Erik Rijkers wrote: On 2018-01-30 17:08, Oliver Ford wrote: On Tue, Jan 30, 2018 at 10:48 AM, Oliver Fordwrote: I will send out v10 soon with the desc functions removed and the EXCLUDE_NO_OTHERS define removed. Here it is. Exclude No Others is still in the parser, but does nothing. All desc functions are removed, replaced with a sortByAsc bool. It no longer changes catversion. There must be a small difference here but I don't even see it... Sorry to be bothering you with these tiny things :) thanks, Erik Rijkers *** /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/expected/window.out 2018-01-30 17:13:41.463633724 +0100 --- /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/results/window.out 2018-01-30 17:16:59.607871830 +0100 *** *** 1172,1178 (10 rows) SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef --- SELECT i.i, + sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ --- 1172,1178 (10 rows) SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef --- SELECT i.i, + sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ *** *** 1198,1204 (10 rows) SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef - SELECT i.i, + sum(i.i) OVER (ORDER BY i.i GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ --- 1198,1204 (10 rows) SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef - SELECT i.i, + sum(i.i) OVER (ORDER BY i.i GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ ==
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2018-01-30 17:08, Oliver Ford wrote: On Tue, Jan 30, 2018 at 10:48 AM, Oliver Fordwrote: I will send out v10 soon with the desc functions removed and the EXCLUDE_NO_OTHERS define removed. Here it is. Exclude No Others is still in the parser, but does nothing. All desc functions are removed, replaced with a sortByAsc bool. It no longer changes catversion. There must be a small difference here but I don't even see it... Sorry to be bothering you with these tiny things :) thanks, Erik Rijkers
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Tuesday, 30 January 2018, Tom Lanewrote: > Another thing I'm a little confused by is the precise API for the in_range > support functions (the lack of any documentation for it doesn't help). > I wonder why you chose to provide two support functions per datatype > combination rather than one with an additional boolean argument. > In fact, it almost seems like the "end" flag could already do the > job, though I may be missing something. As-is, it seems like this > setup involves a lot of duplicate code and catalog entries ... what > are we gaining from that? > > regards, tom lane > We could instead remove the "desc" functions and flip the values of both "preceding" and "end" for a descending order. It just needs an extra bool in the parsenode/plannode structs to send to nodeWindowAgg. I used two functions because it seemed cleaner to me to get the Oid of the function in the parser for both ordering types, so then nodeWindowAgg doesn't have to know about sort order (doesn't have to have extra conditionals for the two). But yes it does increase the catalog and code size so is probably better removed. I will send out v10 soon with the desc functions removed and the EXCLUDE_NO_OTHERS define removed.
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Monday, 29 January 2018, Tom Lanewrote: > Oliver Ford writes: > > On Monday, 29 January 2018, Tom Lane wrote: > >> I've started to go through this in some detail, and I'm wondering why > >> you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than > >> just representing that choice as default (0). > > > My guess is that it's a little like putting "ORDER BY x ASC" when ASC is > > usually default behavior - it adds some documentation, perhaps for people > > new to SQL or to make your intention more explicit. That's the only > reason > > I can think of as to why the standards committee included it. > > Yeah, they like to do that. And "ORDER BY x ASC" is actually a precise > precedent, because we don't print ASC either, cf get_rule_orderby(). > > regards, tom lane > I would strongly suggest taking it out entirely then. There really doesn't seem a point in adding a new keyword and a new condition in the grammar if it is going to do absolutely nothing. If anyone thinks it's useful to have I can just take it out of ruleutils and remove its define. But personally I would remove it entirely as it's really just clutter.
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > On Monday, 29 January 2018, Tom Lane wrote: >> I've started to go through this in some detail, and I'm wondering why >> you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than >> just representing that choice as default (0). > My guess is that it's a little like putting "ORDER BY x ASC" when ASC is > usually default behavior - it adds some documentation, perhaps for people > new to SQL or to make your intention more explicit. That's the only reason > I can think of as to why the standards committee included it. Yeah, they like to do that. And "ORDER BY x ASC" is actually a precise precedent, because we don't print ASC either, cf get_rule_orderby(). regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Monday, 29 January 2018, Tom Lanewrote: > Oliver Ford writes: > > [ 0001-window-frame-v9.patch ] > > I've started to go through this in some detail, and I'm wondering why > you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than > just representing that choice as default (0). As you have it, a > window definition that explicitly specifies EXCLUDE NO OTHERS will be > considered unequal to one that just defaults to that behavior, in > e.g. transformWindowFuncCall(). That seems undesirable, if not > outright wrong. Against that, having the bit allows ruleutils.c > to print "EXCLUDE NO OTHERS" when the input included that, but that > seems like a wash if not an anti-feature. We've never been big on > making ruleutils output distinguish explicit from implicit selection > of a default setting, and in this case it could possibly lead to > outputting a query in a form that is not backwards-compatible to > older PG versions, when there's no need to be incompatible. Exclude No Others does seem pretty pointless to me, but it's in the standard so I included it as an option that can be printed by ruleutils. I can't imagine it being much used, but if people do want to document that they are not excluding other rows they can do so. My guess is that it's a little like putting "ORDER BY x ASC" when ASC is usually default behavior - it adds some documentation, perhaps for people new to SQL or to make your intention more explicit. That's the only reason I can think of as to why the standards committee included it. > If there's some other consideration I'm missing, please say what; > otherwise I'll change it. > > BTW, I generally recommend not including a catversion change in > submitted patches. It causes merge problems any time some other > catversion-bumping patch gets committed, and it can't possibly be > the right value for the final commit since you aren't likely to > be able to predict that date in advance. It surely doesn't hurt > to remind the committer that a catversion bump is needed, but just > do that in the submission message. > Ok won't do that again.
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > [ 0001-window-frame-v9.patch ] I've started to go through this in some detail, and I'm wondering why you invented a FRAMEOPTION_EXCLUDE_NO_OTHERS option bit rather than just representing that choice as default (0). As you have it, a window definition that explicitly specifies EXCLUDE NO OTHERS will be considered unequal to one that just defaults to that behavior, in e.g. transformWindowFuncCall(). That seems undesirable, if not outright wrong. Against that, having the bit allows ruleutils.c to print "EXCLUDE NO OTHERS" when the input included that, but that seems like a wash if not an anti-feature. We've never been big on making ruleutils output distinguish explicit from implicit selection of a default setting, and in this case it could possibly lead to outputting a query in a form that is not backwards-compatible to older PG versions, when there's no need to be incompatible. If there's some other consideration I'm missing, please say what; otherwise I'll change it. BTW, I generally recommend not including a catversion change in submitted patches. It causes merge problems any time some other catversion-bumping patch gets committed, and it can't possibly be the right value for the final commit since you aren't likely to be able to predict that date in advance. It surely doesn't hurt to remind the committer that a catversion bump is needed, but just do that in the submission message. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkers wrote: >> Regression tests only succeed for assert-disabled compiles; they fail when >> assert-enabled: > Problem seems to be with an existing Assert in catcache.c:1545: > Assert(nkeys > 0 && nkeys < cache->cc_nkeys); > The "<" needs to be "<=" (and is changed in the attached patch). No, that Assert is correct, because it's in SearchCatCacheList. It doesn't make any sense to use SearchCatCacheList for a lookup that specifies all of the key columns, because then you necessarily have at most one match; you might as well use regular SearchCatCache, which is significantly more efficient. > AFAICT this was never a problem before purely because no code before > this patch called SearchSysCacheList4, so they always called with > fewer keys than the number available. Hm, probably we shouldn't even have that macro. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2018-01-27 11:49, Oliver Ford wrote: On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkerswrote: On 2018-01-27 00:35, Oliver Ford wrote: Attached patch implements an extensible version of the RANGE with values clause. It doesn't actually add any more type support than was [...] I've tested that the existing regression tests in previous versions still pass, and also added new tests for descending mode. Hi, Regression tests only succeed for assert-disabled compiles; they fail when assert-enabled: I used (Centos 6.9): Could you please try the attached version? It works for me with asserts enabled. [0001-window-frame-v8.patch] Yes, that fixed it, thanks. Problem seems to be with an existing Assert in catcache.c:1545: Assert(nkeys > 0 && nkeys < cache->cc_nkeys); The "<" needs to be "<=" (and is changed in the attached patch). AFAICT this was never a problem before purely because no code before this patch called SearchSysCacheList4, so they always called with fewer keys than the number available. But it's surely correct to assert that the number of keys supplied is less than or equal to, not less than, the number of keys in the cache.
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > On Tuesday, 9 January 2018, Tom Lane wrote: >> So the approach I'm imagining now is a datatype-specific support function >> along the lines of >> in_range(a, b, delta) returns bool >> Likely there are two of these, one each for the PRECEDING and FOLLOWING >> cases. > Would you prefer two functions, or a single function with a parameter for > PRECEDING/FOLLOWING? Maybe: > in_range(a, b, delta, following) returns bool You could do it that way too. The two-function approach seems a little cleaner and easier to document IMO, but it would create more catalog bloat, so there's that. I don't have a strong preference. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Tuesday, 9 January 2018, Tom Lanewrote: > > So the approach I'm imagining now is a datatype-specific support function > along the lines of > > in_range(a, b, delta) returns bool > > which is supposed to return true if a <= b + delta, or something along > that line --- exact details of the definition TBD --- with the proviso > that if b + delta would overflow then the result is automatically true. > > We could probably also delegate the requirement of throwing an error > for negative delta to this function, eliminating the need for the > datatype-independent core code to know how to tell that, which is the > other datatype-dependent behavior needed per spec. > > Likely there are two of these, one each for the PRECEDING and FOLLOWING > cases. > > > Would you prefer two functions, or a single function with a parameter for PRECEDING/FOLLOWING? Maybe: in_range(a, b, delta, following) returns bool Where following is a bool which is true if FOLLOWING was specified and false if PRECEDING was specified?
Re: Add RANGE with values and exclusions clauses to the Window Functions
Vik Fearingwrites: > I'm -1 on such a patch, even though I would really like this feature. For the record, I'd really like to get this feature in too (and am willing to help) ... but it needs to be done right. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 01/09/2018 10:59 PM, Tom Lane wrote: > Generally speaking, Postgres tries hard to be an extensible-datatype > system, going beyond the SQL standard's minimum requirements when > necessary to make it so. The reason that we don't already have RANGE > PRECEDING/FOLLOWING support is that nobody was satisfied with only > making it work for integers and datetimes. There was, as I recall, code > implementing more or less what you've got here in the original window > function submission, and we pulled it out before committing because of > that inadequacy. I don't think the fact that some years have gone by > means that we should forget about keeping the feature extensible. I'm glad I read the thread before I replied. My biggest complaint I had in my head when reading the initial post was that clamping down on specific datatypes was distinctly non-PostgreSQL-esque. I'm -1 on such a patch, even though I would really like this feature. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: Add RANGE with values and exclusions clauses to the Window Functions
Oliver Fordwrites: > [ 0001-window-frame-v6.patch ] Generally speaking, Postgres tries hard to be an extensible-datatype system, going beyond the SQL standard's minimum requirements when necessary to make it so. The reason that we don't already have RANGE PRECEDING/FOLLOWING support is that nobody was satisfied with only making it work for integers and datetimes. There was, as I recall, code implementing more or less what you've got here in the original window function submission, and we pulled it out before committing because of that inadequacy. I don't think the fact that some years have gone by means that we should forget about keeping the feature extensible. One subsequent discussion about how we might make it work to project standards was here: https://www.postgresql.org/message-id/flat/51C3B952.60907%402ndquadrant.com Looking back at that, I notice that we all focused on the way to identify a suitable "+" or "-" operator, but now I'm thinking that that's not actually a good factorization, because it'd be subject to undesirable overflow hazards. That is, if we have an integer sequence like 2147483640 2147483641 2147483642 2147483643 2147483644 and we operate on this with "RANGE FOLLOWING 10", that approach results in an integer overflow when we try to calculate the limit values. But there's no real need for an overflow error. Ideally, if we try to form 2147483640 + 10 and notice it's overflowed, we'd treat the bound as +infinity, because every non-overflowed integer value must be within range. So the approach I'm imagining now is a datatype-specific support function along the lines of in_range(a, b, delta) returns bool which is supposed to return true if a <= b + delta, or something along that line --- exact details of the definition TBD --- with the proviso that if b + delta would overflow then the result is automatically true. We could probably also delegate the requirement of throwing an error for negative delta to this function, eliminating the need for the datatype-independent core code to know how to tell that, which is the other datatype-dependent behavior needed per spec. Likely there are two of these, one each for the PRECEDING and FOLLOWING cases. As suggested in the above-mentioned thread, we could attach such functions as support functions in the btree opclass that defines the sort order of the window frame's ordering column, and the core code could look it up from there. Extensibility would come from the fact that people can define new opclasses. Also, I believe we could support multiple such functions per opclass, allowing the potential to support "delta"s of different datatypes --- pg_amproc.amproclefttype would correspond to the common type of a and b, while pg_amproc.amprocrighttype would correspond to the data type of delta. We certainly need to allow delta to be a different type from a/b just to handle the spec's timestamp cases. I'm not sure if there's near-term value in multiple types of delta values, but it seems easy to allow in this framework. regards, tom lane
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Tue, Nov 28, 2017 at 10:51:19AM +, Oliver Ford wrote: > On Tue, Nov 28, 2017 at 4:38 AM, David Fetterwrote: > > I've taken the liberty of adding float8, somewhat mechanically. Do > > the docs need some change, assuming that addition is useful? > > > > Best, > > David. > > -- > > David Fetter http://fetter.org/ > > Phone: +1 415 235 3778 > > The SQL:2011 standard says that the range values should only be an > integer or interval. My understanding is therefore that the ORDER BY > columns should only be either an integer, with integer range values - > or a date/time, with interval range values. > > I think if we go outside the standard we should leave it for another > patch and further discussion. But maybe others would prefer to add > support for more types even if they are non-standard? I confess I was thinking more in terms of the use cases I recall from back when I was training to be a scientist than the restrictions the standard imposed. Hapoy to make this extension separate if that's the consensus. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Tue, Nov 28, 2017 at 4:38 AM, David Fetterwrote: > On Mon, Nov 27, 2017 at 04:55:17PM +, Oliver Ford wrote: >> On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkers wrote: >> > On 2017-11-27 17:34, Erik Rijkers wrote: >> >> >> >> On 2017-11-27 16:01, Oliver Ford wrote: >> >>> >> >>> Attached is it in bare diff form. >> >> >> >> >> >> [0001-window-frame-v3.patch] >> >> >> >> Thanks, that did indeed fix it: >> >> >> >> make && make check now ok. >> >> >> >> There were errors in the doc build (unmatched tags); I fixed them in >> >> the attached doc-patch (which should go on top of yours). >> > >> > >> > 0001-window-frame-v3-fixtags.diff >> > >> > now attached, I hope... >> > >> >> Cheers here's v4 with the correct docs. > > I've taken the liberty of adding float8, somewhat mechanically. Do > the docs need some change, assuming that addition is useful? > > Best, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 The SQL:2011 standard says that the range values should only be an integer or interval. My understanding is therefore that the ORDER BY columns should only be either an integer, with integer range values - or a date/time, with interval range values. I think if we go outside the standard we should leave it for another patch and further discussion. But maybe others would prefer to add support for more types even if they are non-standard?
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Mon, Nov 27, 2017 at 4:40 PM, Erik Rijkerswrote: > On 2017-11-27 17:34, Erik Rijkers wrote: >> >> On 2017-11-27 16:01, Oliver Ford wrote: >>> >>> Attached is it in bare diff form. >> >> >> [0001-window-frame-v3.patch] >> >> Thanks, that did indeed fix it: >> >> make && make check now ok. >> >> There were errors in the doc build (unmatched tags); I fixed them in >> the attached doc-patch (which should go on top of yours). > > > 0001-window-frame-v3-fixtags.diff > > now attached, I hope... > Cheers here's v4 with the correct docs. 0001-window-frame-v4.patch Description: Binary data
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2017-11-27 17:34, Erik Rijkers wrote: On 2017-11-27 16:01, Oliver Ford wrote: Attached is it in bare diff form. [0001-window-frame-v3.patch] Thanks, that did indeed fix it: make && make check now ok. There were errors in the doc build (unmatched tags); I fixed them in the attached doc-patch (which should go on top of yours). 0001-window-frame-v3-fixtags.diff now attached, I hope... --- doc/src/sgml/syntax.sgml.orig 2017-11-27 17:19:30.253810944 +0100 +++ doc/src/sgml/syntax.sgml 2017-11-27 17:20:55.515626931 +0100 @@ -1805,8 +1805,8 @@ and the optional frame_clause can be one of -{ RANGE | ROWS } frame_start [ frame_exclusion_clause ] -{ RANGE | ROWS } BETWEEN frame_start AND frame_end [ frame_exclusion_clause ] +{ RANGE | ROWS } frame_start [ frame_exclusion_clause ] +{ RANGE | ROWS } BETWEEN frame_start AND frame_end [ frame_exclusion_clause ] where frame_start and frame_end can be one of @@ -1817,7 +1817,7 @@ value FOLLOWING UNBOUNDED FOLLOWING -where the optional frame_exclusion_clause can be one of +where the optional frame_exclusion_clause can be one of EXCLUDE CURRENT ROW EXCLUDE TIES @@ -1888,24 +1888,24 @@ The value PRECEDING and -value FOLLOWING cases, when used -in ROWS mode, indicate that the frame starts or ends the specified -number of rows before or after the current row. In ROWS mode, -value must be an integer expression not containing any variables, +value FOLLOWING cases, when used +in ROWS mode, indicate that the frame starts or ends the specified +number of rows before or after the current row. In ROWS mode, +value must be an integer expression not containing any variables, aggregate functions, or window functions. -When used in RANGE mode, they indicate that the frame starts or ends when the value of -each row's ORDER BY column is within the start value and end value bounds. In RANGE mode, -value can be either an integer expression or a date/time interval. -In RANGE mode, there must be exactly one ORDER BY column and if the column is an integer column, -then value must be an integer. -If it is a date/time column, then value must be an interval. In both modes, +When used in RANGE mode, they indicate that the frame starts or ends when the value of +each row's ORDER BY column is within the start value and end value bounds. In RANGE mode, +value can be either an integer expression or a date/time interval. +In RANGE mode, there must be exactly one ORDER BY column and if the column is an integer column, +then value must be an integer. +If it is a date/time column, then value must be an interval. In both modes, the value must not be null or negative; but it can be zero, which just selects the current row. -For the frame_exclusion_clause, EXCLUDE CURRENT ROW -excludes the current row from the frame. EXCLUDE TIES excludes any peers of the current row from the -frame. EXCLUDE NO OTHERS does nothing, but is provided in order to optionally document the intention +For the frame_exclusion_clause, EXCLUDE CURRENT ROW +excludes the current row from the frame. EXCLUDE TIES excludes any peers of the current row from the +frame. EXCLUDE NO OTHERS does nothing, but is provided in order to optionally document the intention not to exclude any other rows.
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2017-11-27 16:01, Oliver Ford wrote: Attached is it in bare diff form. [0001-window-frame-v3.patch] Thanks, that did indeed fix it: make && make check now ok. There were errors in the doc build (unmatched tags); I fixed them in the attached doc-patch (which should go on top of yours). (In very limited testing I did not find any problems yet) thanks, Erik Rijkers
Re: Add RANGE with values and exclusions clauses to the Window Functions
On Mon, Nov 27, 2017 at 12:06 PM, Oliver Fordwrote: > On Fri, Nov 24, 2017 at 3:08 PM, Erikjan Rijkers wrote: >> SELECT pg_get_viewdef('v_window'); >> ! pg_get_viewdef >> ! -- >> ! SELECT i.i,+ >> ! sum(i.i) OVER (ORDER BY i.i) AS sum_rows+ >> FROM generate_series(1, 10) i(i); >> (1 row) >> >> --- 1034,1043 >> (10 rows) >> >> SELECT pg_get_viewdef('v_window'); >> ! pg_get_viewdef >> ! >> --- >> ! SELECT i.i, >> + >> ! sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) >> AS sum_rows+ >> FROM generate_series(1, 10) i(i); >> (1 row) >> >> >> This small hickup didn't prevent building an instance but obviously I >> haven't done any real tests yet. >> >> >> thanks, >> >> >> Erik Rijkers > > After another clone and rebuild it works alright with the correct > spacing on mine, so the attached v2 should all pass. I noticed that I > hadn't added the exclusions clauses to the view defs code, so that's > also in this patch with extra tests to cover it. Sorry previous patch was in full-commit form and not just a diff. Attached is it in bare diff form. 0001-window-frame-v3.patch Description: Binary data
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 24 November 2017 at 22:11, Oliver Fordwrote: > Adds RANGE BETWEEN with a start and end value, as well as an > exclusions clause, to the window functions. This partially resolves > TODO list item "Implement full support for window framing clauses". > Yay! I'll try to take a look at this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Add RANGE with values and exclusions clauses to the Window Functions
On 2017-11-24 15:11, Oliver Ford wrote: Adds RANGE BETWEEN with a start and end value, as well as an exclusions clause, to the window functions. This partially resolves TODO list item "Implement full support for window framing clauses". [0001-window-frame-v1.patch] (debian 8) make check fails: foreign_data ... ok window ... FAILED xmlmap ... ok The diff is: $ ( cd /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress && cat regression.diffs ) *** /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/expected/window.out 2017-11-24 15:36:15.387573714 +0100 --- /var/data1/pg_stuff/pg_sandbox/pgsql.frame_range/src/test/regress/results/window.out 2017-11-24 15:38:35.290553157 +0100 *** *** 1034,1043 (10 rows) SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef ! -- ! SELECT i.i,+ ! sum(i.i) OVER (ORDER BY i.i) AS sum_rows+ FROM generate_series(1, 10) i(i); (1 row) --- 1034,1043 (10 rows) SELECT pg_get_viewdef('v_window'); ! pg_get_viewdef ! --- ! SELECT i.i, + ! sum(i.i) OVER (ORDER BY i.i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_rows+ FROM generate_series(1, 10) i(i); (1 row) This small hickup didn't prevent building an instance but obviously I haven't done any real tests yet. thanks, Erik Rijkers
Add RANGE with values and exclusions clauses to the Window Functions
Adds RANGE BETWEEN with a start and end value, as well as an exclusions clause, to the window functions. This partially resolves TODO list item "Implement full support for window framing clauses". == Specification == The window functions already allow a "ROWS BETWEEN start_value PRECEDING/FOLLOWING AND end_value PRECEDING/FOLLOWING" to restrict the number of rows within a partition that are piped into an aggregate function based on their position before or after the current row. This patch adds an equivalent for RANGE which restricts the rows based on whether the _values_ of the ORDER BY column for all other rows in the partition are within the start_value and end_value bounds. This brings PostgreSQL to parity with Oracle, and implements a SQL:2011 standard feature. SQL:2011 also defines a window frame exclusion clause, which excludes certain rows from the result. This clause doesn't seem to be implemented in any mainstream RDBMS (MariaDb mentions that fact in its documentation here: https://mariadb.com/kb/en/library/window-functions-overview/ and has it on its TODO list). This patch implements three EXCLUDE clauses described in the standard: EXCLUDE CURRENT ROW - excludes the current row from the result EXCLUDE TIES - excludes identical rows from the result EXCLUDE NO OTHERS - does nothing, is the default behavior; exists purely to describe the intention not to exclude any other rows The RANGE BETWEEN clause requires a single ORDER BY column which must be either an integer or a date/time type. If the column is a date/time type then start_value and end_value must both be an interval type. If the column is an integer, then the values must both be integers. == Testing == Tested on Windows with MinGW. All existing regression tests pass. New tests and updated documentation is included. Tests show both the new RANGE with values working and the exclusion clause working in both RANGE and ROWS mode. == Future Work == The standard also defines, in addition to RANGE and ROWS, a GROUPS option with a corresponding EXCLUDE GROUP option. This also doesn't seem to be implemented anywhere else, and I plan to implement it next. This patch also adds some new error messages which have not been internationalized. 0001-window-frame-v1.patch Description: Binary data