On 04/12/22 02:27, David Rowley wrote:

To make this work when rows can exit the window frame seems
significantly harder. Likely a hash table would be a better data
structure to remove records from, but then how are you going to spill
the hash table to disk when it reaches work_mem? As David J mentions,
it seems like you'd need a hash table with a counter to track how many
times a given value appears and only remove it from the table once
that counter reaches 0.  Unsure how you're going to constrain that to
not use more than work_mem though.

Interesting problem, Hashtables created in normal aggregates (AGG_HASHED mode) may provide some reference as they have hashagg_spill_tuple but I am not sure of any prior implementation of hashtable with counter and spill. Major concern is, if we go through tuplesort route (without order by case), would we get handicapped in future if we want order by or more features?

Are there any other databases which support DISTINCT window aggregate
with an ORDER BY in the window clause?

Oracle db support distinct window aggregates albeit without order by clause. Rest of databases which I've tried (mysql/sqlserver express) don't even support distinct in window aggregates so those gets ruled out as well.

If you were to limit this to only working with the query you mentioned
in [1], i.e PARTITION BY without an ORDER BY, then you only need to
aggregate once per partition per aggregate and you only need to do
that once all of the tuples for the partition are in the tuplestore.
It seems to me like you could add all the records to a tuplesort and
then sort by the DISTINCT column then aggregate everything except for
consecutive duplicates. You can then aggregate any other aggregates
which share the same DISTINCT column, otherwise, you just destroy the
tuplesort and rinse and repeat for the next aggregate.
This looks like way to go that would ensure main use case of portability from Oracle.

If you were to limit this to only working with the query you mentioned
in [1], i.e PARTITION BY without an ORDER BY,

I need to dig deeper into order by case.

--
Regards,
Ankit Kumar Pandey



Reply via email to