Re: [PATCH] distinct aggregates within a window function WIP
I resolve my problem https://stackoverflow.com/a/67167595/4632019: Could it be possible PG will use `filter` trick when DISTINCT is used: `sum (distinct suma)`? This will benefit to not write second SELECT https://www.postgresql.org/message-id/CAN1PwonqojSAP_N91zO5Hm7Ta4Mdib-2YuUaEd0NP6Fn6XutzQ%40mail.gmail.com >About yours additional note, I think that it is not possible to get easy the same result with appropriate use of window framing options, Can you try next approach? My approach is https://dbfiddle.uk/?rdbms=postgres_13=01c699f3f47ca9fca8215f8cbf556218: Assign row_number at each order: row_number() over (partition by agreement_id, order_id ) as nrow Take only first suma: filter nrow = 1 with data as ( select * from (values ( 1, 1, 1, 1.0049 ), (2, 1,1,1.0049), ( 3, 1,1,1.0049 ) , ( 4, 1, 2, 1.0049 ), (5, 1,2,1.0057), ( 6, 2, 1, 1.53 ), ( 7,2,1,2.18), ( 8,2,2,3.48 ) ) t (id, agreement_id, order_id, suma) ), intermediate as (select *, row_number() over (partition by agreement_id, order_id ) as nrow, (sum( suma ) over ( partition by agreement_id, order_id ))::numeric( 10, 2) as order_suma, from data) select *, sum( order_suma ) filter (where nrow = 1) over (partition by agreement_id) from intermediate Wednesday, April 22, 2020, 10:05:19 AM, you wrote: > Hello Andreas, > Tuesday, April 21, 2020, 6:17:00 PM, you wrote: >> On 4/21/20 5:06 PM, Eugen Konkov wrote: >>> Hi. I read the thread. >>> Probably this fiddle will be helpful for testing: >>> https://dbfiddle.uk/?rdbms=postgres_12=abe845142a5099d921d3729043fb8491 >>> I recently encountered a problem: >>> Why Window-specific functions do not allow DISTINCT to be used within the >>> function argument list? >>> sum( DISTINCT order_cost ) OVER ( PARTITION BY invoice_id ORDER BY >>> invoice_id, group_id RANGE unbound preceeding and unbound following ) >>> behavior is quite deterministic: >>> ORDER BY will create peers in partition >>> DISTINCT will get only one peer >>> I resolve my problem via two subqueries, but it seems this logic may >>> be applied to window functions (did not check this for other functions >>> thought) >> Sorry, I do not follow. What problem did you encounter? > Lack of DISTINCT for window function SUM -- Best regards, Eugen Konkov
Re: [PATCH] distinct aggregates within a window function WIP
On Wed, Sep 16, 2020 at 11:35:22AM +0300, Krasiyan Andreev wrote: > I am thinking also to concentrate on Vik's patch, if it has a clear design > point of view, clear design, I can withdraw mine patch. Okay, I have done that then. -- Michael signature.asc Description: PGP signature
Re: [PATCH] distinct aggregates within a window function WIP
Thank you very much. I think that Vik Fearing's patch about "Implement for window functions" is much clear, better and has a chance to be committed. For me it's not important which patch will go into PostgreSQL, because it's a much needed feature. In mine patch, there is also a feature about using negative indexes, to be able to reverse order in exact same window frame for "FROM FIRST/FROM LAST", but I am not sure, is such non-standard usage is acceptable (it's the same as some array functions in programming language), if it's acceptable, it can be easy ported to Vik's patch. I am thinking also to concentrate on Vik's patch, if it has a clear design point of view, clear design, I can withdraw mine patch. На ср, 16.09.2020 г. в 11:19 Surafel Temesgen написа: > > On Thu, Mar 5, 2020 at 4:17 AM Krasiyan Andreev > wrote: > >> I have currently suspended development of this patch, based on it's >> review, >> but I will continue development of the other Oliver Ford's work about >> adding support of respect/ignore nulls >> for lag(),lead(),first_value(),last_value() and nth_value() and from >> first/last for nth_value() patch, >> but I am not sure how to proceed with it's implementation and any >> feedback will be very helpful. >> >> > * I applied your patch on top of 58c47ccfff20b8c125903 . It applied > cleanly , compiled, make check pass, but it have white space errors: > > *Added functions on windowfuncs.c have no comments so it's not easily > understandable. > > * Regression test addition seems huge to me. Can you reduce that? You can > use existing tables and fewer records. > > * I don’t understand why this patch has to change makeBoolAConst? It > already make “bool” constant node > > > regards > > Surafel > >
Re: [PATCH] distinct aggregates within a window function WIP
On Thu, Mar 5, 2020 at 4:17 AM Krasiyan Andreev wrote: > I have currently suspended development of this patch, based on it's > review, > but I will continue development of the other Oliver Ford's work about > adding support of respect/ignore nulls > for lag(),lead(),first_value(),last_value() and nth_value() and from > first/last for nth_value() patch, > but I am not sure how to proceed with it's implementation and any feedback > will be very helpful. > > * I applied your patch on top of 58c47ccfff20b8c125903 . It applied cleanly , compiled, make check pass, but it have white space errors: *Added functions on windowfuncs.c have no comments so it's not easily understandable. * Regression test addition seems huge to me. Can you reduce that? You can use existing tables and fewer records. * I don’t understand why this patch has to change makeBoolAConst? It already make “bool” constant node regards Surafel
Re: [PATCH] distinct aggregates within a window function WIP
Hello Andreas, Tuesday, April 21, 2020, 6:17:00 PM, you wrote: > On 4/21/20 5:06 PM, Eugen Konkov wrote: >> Hi. I read the thread. >> >> Probably this fiddle will be helpful for testing: >> >> https://dbfiddle.uk/?rdbms=postgres_12=abe845142a5099d921d3729043fb8491 >> >> I recently encountered a problem: >> Why Window-specific functions do not allow DISTINCT to be used within the >> function argument list? >> >> sum( DISTINCT order_cost ) OVER ( PARTITION BY invoice_id ORDER BY >> invoice_id, group_id RANGE unbound preceeding and unbound following ) >> >> behavior is quite deterministic: >> >> ORDER BY will create peers in partition >> DISTINCT will get only one peer >> >> I resolve my problem via two subqueries, but it seems this logic may >> be applied to window functions (did not check this for other functions >> thought) > Sorry, I do not follow. What problem did you encounter? Lack of DISTINCT for window function SUM -- Best regards, Eugen Konkov
Re: [PATCH] distinct aggregates within a window function WIP
On 4/21/20 5:06 PM, Eugen Konkov wrote: Hi. I read the thread. Probably this fiddle will be helpful for testing: https://dbfiddle.uk/?rdbms=postgres_12=abe845142a5099d921d3729043fb8491 I recently encountered a problem: Why Window-specific functions do not allow DISTINCT to be used within the function argument list? sum( DISTINCT order_cost ) OVER ( PARTITION BY invoice_id ORDER BY invoice_id, group_id RANGE unbound preceeding and unbound following ) behavior is quite deterministic: ORDER BY will create peers in partition DISTINCT will get only one peer I resolve my problem via two subqueries, but it seems this logic may be applied to window functions (did not check this for other functions thought) Sorry, I do not follow. What problem did you encounter? Andreas
[PATCH] distinct aggregates within a window function WIP
Hi. I read the thread. Probably this fiddle will be helpful for testing: https://dbfiddle.uk/?rdbms=postgres_12=abe845142a5099d921d3729043fb8491 I recently encountered a problem: Why Window-specific functions do not allow DISTINCT to be used within the function argument list? sum( DISTINCT order_cost ) OVER ( PARTITION BY invoice_id ORDER BY invoice_id, group_id RANGE unbound preceeding and unbound following ) behavior is quite deterministic: ORDER BY will create peers in partition DISTINCT will get only one peer I resolve my problem via two subqueries, but it seems this logic may be applied to window functions (did not check this for other functions thought) -- Best regards, Eugen Konkov
Re: [PATCH] distinct aggregates within a window function WIP
I have currently suspended development of this patch, based on it's review, but I will continue development of the other Oliver Ford's work about adding support of respect/ignore nulls for lag(),lead(),first_value(),last_value() and nth_value() and from first/last for nth_value() patch, but I am not sure how to proceed with it's implementation and any feedback will be very helpful. I have dropped support of from first/last for nth_value(), but also I reimplemented it in a different way, by using negative number for the position argument, to be able to get the same frame in exact reverse order. After that patch becomes much more simple and major concerns about precedence hack has gone, but maybe it can be additionally simplified. I have not renamed special bool type "ignorenulls", because it is probably not acceptable way for calling extra version of window functions (but it makes things very easy and it can reuse frames), but I removed the other special bool type "fromlast". Attached file is for PostgreSQL 13 (master git branch) and I will add it now to a March commit fest, to be able to track changes. Everything works and patch is in very good shape, all tests are passed and also, I use it from some time for SQL analysis purposes (because ignore nulls is one of the most needed feature in OLAP/BI area and Oracle, Amazon Redshift, even Informix have it). After patch review and suggestions about what to do with special bool type and unreserved keywords, I will reimplement it, if needed. На пн, 13.01.2020 г. в 18:19 Vik Fearing написа: > On 13/01/2020 15:19, Tom Lane wrote: > > Krasiyan Andreev writes: > >> I want to propose to you an old patch for Postgres 11, off-site > developed > >> by Oliver Ford, > >> but I have permission from him to publish it and to continue it's > >> development, > >> that allow distinct aggregates, like select sum(distinct nums) within a > >> window function. > > I started to respond by asking whether that's well-defined, but > > reading down further I see that that's not actually what the feature > > is: what it is is attaching DISTINCT to a window function itself. > > I'd still ask whether it's well-defined though, or even minimally > > sensible. Window functions are generally supposed to produce one > > row per input row --- how does that square with the implicit row > > merging of DISTINCT? They're also typically row-order-sensitive > > --- how does that work with DISTINCT? > > > It's a little strange because the spec says: > > > > If the window ordering clause or the window framing clause of the window > structure descriptor that describes the > is present, then no simply contained in function> shall specify DISTINCT or . > > > > So it seems to be well defined if all you have is a partition. > > > But then it also says: > > > > DENSE_RANK() OVER WNS is equivalent to the : > COUNT (DISTINCT ROW ( VE 1 , ..., VE N ) ) > OVER (WNS1 RANGE UNBOUNDED PRECEDING) > > > > And that kind of looks like a framing clause there. > > > > Also, to the extent that > > this is sensible, can't you get the same results already today > > with appropriate use of window framing options? > > > I don't see how. > > > I have sometimes wanted this feature so I am +1 on us getting at least a > minimal form of it. > > -- > > Vik > > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 28035f1635..3d73c96891 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15702,7 +15702,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; lag(value anyelement [, offset integer - [, default anyelement ]]) + [, default anyelement ]]) [null_treatment] @@ -15731,7 +15731,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; lead(value anyelement [, offset integer - [, default anyelement ]]) + [, default anyelement ]]) [null_treatment] @@ -15757,7 +15757,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; first_value - first_value(value any) + first_value(value any) [null_treatment] same type as value @@ -15773,7 +15773,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; last_value - last_value(value any) + last_value(value any) [null_treatment] same type as value @@ -15790,7 +15790,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; nth_value - nth_value(value any, nth integer) + nth_value(value any, nth integer) [null_treatment] @@ -15806,6 +15806,16 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; + + In , null_treatment is one of: + + RESPECT NULLS + IGNORE NULLS + +
Re: [PATCH] distinct aggregates within a window function WIP
On 13/01/2020 15:19, Tom Lane wrote: > Krasiyan Andreev writes: >> I want to propose to you an old patch for Postgres 11, off-site developed >> by Oliver Ford, >> but I have permission from him to publish it and to continue it's >> development, >> that allow distinct aggregates, like select sum(distinct nums) within a >> window function. > I started to respond by asking whether that's well-defined, but > reading down further I see that that's not actually what the feature > is: what it is is attaching DISTINCT to a window function itself. > I'd still ask whether it's well-defined though, or even minimally > sensible. Window functions are generally supposed to produce one > row per input row --- how does that square with the implicit row > merging of DISTINCT? They're also typically row-order-sensitive > --- how does that work with DISTINCT? It's a little strange because the spec says: If the window ordering clause or the window framing clause of the window structure descriptor that describes the is present, then no simply contained in shall specify DISTINCT or . So it seems to be well defined if all you have is a partition. But then it also says: DENSE_RANK() OVER WNS is equivalent to the : COUNT (DISTINCT ROW ( VE 1 , ..., VE N ) ) OVER (WNS1 RANGE UNBOUNDED PRECEDING) And that kind of looks like a framing clause there. > Also, to the extent that > this is sensible, can't you get the same results already today > with appropriate use of window framing options? I don't see how. I have sometimes wanted this feature so I am +1 on us getting at least a minimal form of it. -- Vik
Re: [PATCH] distinct aggregates within a window function WIP
I understand yours note about datatype-specific operations, so I need to think more generic about it. About yours additional note, I think that it is not possible to get easy the same result with appropriate use of window framing options, because "exclude ties" will not exclude "current row" itself, only peers of it. So, that is the only difference and reason of DISTINCT aggregate. Maybe, if we can specify at the same time to "exclude ties" and "exclude current row" itself, there will not be need of DISTINCT, but right now I think that nor "exclude ties" nor "exclude groups" or "exclude current row", can specify it, because they can't be nested or used at the same time. На пн, 13.01.2020 г. в 16:19 Tom Lane написа: > Krasiyan Andreev writes: > > I want to propose to you an old patch for Postgres 11, off-site developed > > by Oliver Ford, > > but I have permission from him to publish it and to continue it's > > development, > > that allow distinct aggregates, like select sum(distinct nums) within a > > window function. > > I started to respond by asking whether that's well-defined, but > reading down further I see that that's not actually what the feature > is: what it is is attaching DISTINCT to a window function itself. > I'd still ask whether it's well-defined though, or even minimally > sensible. Window functions are generally supposed to produce one > row per input row --- how does that square with the implicit row > merging of DISTINCT? They're also typically row-order-sensitive > --- how does that work with DISTINCT? Also, to the extent that > this is sensible, can't you get the same results already today > with appropriate use of window framing options? > > > It's a WIP, because it doesn't have tests yet (I will add them later) and > > also, it works for a int, float, and numeric types, > > As a rule of thumb, operations like this should not be coded to be > datatype-specific. We threw out some features in the original window > function patch until they could be rewritten to not be limited to a > hard-coded set of data types (cf commit 0a459cec9), and I don't see > why we'd apply a lesser standard here. Certainly DISTINCT for > aggregates has no such limitation. > > regards, tom lane >
Re: [PATCH] distinct aggregates within a window function WIP
Tom Lane schrieb am 13.01.2020 um 15:19: > what it is is attaching DISTINCT to a window function itself. > I'd still ask whether it's well-defined though, or even minimally > sensible. Window functions are generally supposed to produce one > row per input row --- how does that square with the implicit row > merging of DISTINCT? They're also typically row-order-sensitive > --- how does that work with DISTINCT? Also, to the extent that > this is sensible, can't you get the same results already today > with appropriate use of window framing options? I find the example using array_agg() and cumulative window functions a bit confusing as well, but I think there are situations where having this is really helpful, e.g.: count(distinct some_column) over (partition by something) I know it's not an argument, but Oracle supports this and porting queries like that from Oracle to Postgres isn't really fun. Thomas
Re: [PATCH] distinct aggregates within a window function WIP
Krasiyan Andreev writes: > I want to propose to you an old patch for Postgres 11, off-site developed > by Oliver Ford, > but I have permission from him to publish it and to continue it's > development, > that allow distinct aggregates, like select sum(distinct nums) within a > window function. I started to respond by asking whether that's well-defined, but reading down further I see that that's not actually what the feature is: what it is is attaching DISTINCT to a window function itself. I'd still ask whether it's well-defined though, or even minimally sensible. Window functions are generally supposed to produce one row per input row --- how does that square with the implicit row merging of DISTINCT? They're also typically row-order-sensitive --- how does that work with DISTINCT? Also, to the extent that this is sensible, can't you get the same results already today with appropriate use of window framing options? > It's a WIP, because it doesn't have tests yet (I will add them later) and > also, it works for a int, float, and numeric types, As a rule of thumb, operations like this should not be coded to be datatype-specific. We threw out some features in the original window function patch until they could be rewritten to not be limited to a hard-coded set of data types (cf commit 0a459cec9), and I don't see why we'd apply a lesser standard here. Certainly DISTINCT for aggregates has no such limitation. regards, tom lane
[PATCH] distinct aggregates within a window function WIP
Hi hackers, I want to propose to you an old patch for Postgres 11, off-site developed by Oliver Ford, but I have permission from him to publish it and to continue it's development, that allow distinct aggregates, like select sum(distinct nums) within a window function. I have rebased it for current git master branch and have made necessary changes to it to work with Postgres 13devel. It's a WIP, because it doesn't have tests yet (I will add them later) and also, it works for a int, float, and numeric types, but probably distinct check can be rewritten for possible performance improvement, with storing the distinct elements in a hash table which should give a performance improvement. If you find the implementation of patch acceptable from committers perspective, I will answer to all yours design and review notes and will try to go ahead with it, also, I will add this patch to the March commit fest. For example usage of a patch, if you have time series data, with current Postgres you will get an error: postgres=# CREATE TABLE t_demo AS postgres-# SELECT ordinality, day, date_part('week', day) AS week postgres-# FROMgenerate_series('2020-01-02', '2020-01-15', '1 day'::interval) postgres-# WITH ORDINALITY AS day; SELECT 14 postgres=# SELECT * FROM t_demo; ordinality | day | week ++-- 1 | 2020-01-02 00:00:00+02 |1 2 | 2020-01-03 00:00:00+02 |1 3 | 2020-01-04 00:00:00+02 |1 4 | 2020-01-05 00:00:00+02 |1 5 | 2020-01-06 00:00:00+02 |2 6 | 2020-01-07 00:00:00+02 |2 7 | 2020-01-08 00:00:00+02 |2 8 | 2020-01-09 00:00:00+02 |2 9 | 2020-01-10 00:00:00+02 |2 10 | 2020-01-11 00:00:00+02 |2 11 | 2020-01-12 00:00:00+02 |2 12 | 2020-01-13 00:00:00+02 |3 13 | 2020-01-14 00:00:00+02 |3 14 | 2020-01-15 00:00:00+02 |3 (14 rows) postgres=# SELECT *, postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) postgres-# FROMt_demo; ERROR: DISTINCT is not implemented for window functions LINE 2: array_agg(DISTINCT week) OVER (ORDER BY day ROWS ^ So you will need to write something like this: postgres=# SELECT *, (SELECT array_agg(DISTINCT unnest) FROM unnest(x)) AS b postgres-# FROM postgres-# ( postgres(# SELECT *, postgres(# array_agg(week) OVER (ORDER BY day ROWS postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS x postgres(# FROMt_demo postgres(# ) AS a; ordinality | day | week | x | b ++--+-+--- 1 | 2020-01-02 00:00:00+02 |1 | {1,1,1} | {1} 2 | 2020-01-03 00:00:00+02 |1 | {1,1,1,1} | {1} 3 | 2020-01-04 00:00:00+02 |1 | {1,1,1,1,2} | {1,2} 4 | 2020-01-05 00:00:00+02 |1 | {1,1,1,2,2} | {1,2} 5 | 2020-01-06 00:00:00+02 |2 | {1,1,2,2,2} | {1,2} 6 | 2020-01-07 00:00:00+02 |2 | {1,2,2,2,2} | {1,2} 7 | 2020-01-08 00:00:00+02 |2 | {2,2,2,2,2} | {2} 8 | 2020-01-09 00:00:00+02 |2 | {2,2,2,2,2} | {2} 9 | 2020-01-10 00:00:00+02 |2 | {2,2,2,2,2} | {2} 10 | 2020-01-11 00:00:00+02 |2 | {2,2,2,2,3} | {2,3} 11 | 2020-01-12 00:00:00+02 |2 | {2,2,2,3,3} | {2,3} 12 | 2020-01-13 00:00:00+02 |3 | {2,2,3,3,3} | {2,3} 13 | 2020-01-14 00:00:00+02 |3 | {2,3,3,3} | {2,3} 14 | 2020-01-15 00:00:00+02 |3 | {3,3,3} | {3} (14 rows) With attached version, you will get the desired results: postgres=# SELECT *, postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) postgres-# FROMt_demo; ordinality | day | week | array_agg ++--+--- 1 | 2020-01-02 00:00:00+02 |1 | {1} 2 | 2020-01-03 00:00:00+02 |1 | {1} 3 | 2020-01-04 00:00:00+02 |1 | {1,2} 4 | 2020-01-05 00:00:00+02 |1 | {1,2} 5 | 2020-01-06 00:00:00+02 |2 | {1,2} 6 | 2020-01-07 00:00:00+02 |2 | {1,2} 7 | 2020-01-08 00:00:00+02 |2 | {2} 8 | 2020-01-09 00:00:00+02 |2 | {2} 9 | 2020-01-10 00:00:00+02 |2 | {2} 10 | 2020-01-11 00:00:00+02 |2 | {2,3} 11 | 2020-01-12 00:00:00+02 |2 | {2,3} 12 | 2020-01-13 00:00:00+02 |3 | {2,3} 13 | 2020-01-14 00:00:00+02 |3 | {2,3} 14 | 2020-01-15 00:00:00+02 |3 | {3} (14 rows) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index