Re: Questions regarding distinct operation implementation

2022-12-05 Thread David Rowley
On Mon, 5 Dec 2022 at 02:34, Ankit Kumar Pandey wrote: > Interesting problem, Hashtables created in normal aggregates (AGG_HASHED > mode) may provide some reference as they have hashagg_spill_tuple but I > am not sure of any prior implementation of hashtable with counter and > spill. I'm unsure

Re: Questions regarding distinct operation implementation

2022-12-04 Thread Ankit Kumar Pandey
On 04/12/22 22:25, Vik Fearing wrote: On 12/4/22 14:34, Ankit Kumar Pandey wrote: This looks like way to go that would ensure main use case of portability from Oracle. The goal should not be portability from Oracle, but adherence to the standard. Yes, Vik. You are right. Wrong remark

Re: Questions regarding distinct operation implementation

2022-12-04 Thread Vik Fearing
On 12/4/22 14:34, Ankit Kumar Pandey wrote: On 04/12/22 02:27, David Rowley wrote: If you were to limit this to only working with the query you mentioned in [1], i.e PARTITION BY without an ORDER BY, then you only need to aggregate once per partition per aggregate and you only need to do

Re: Questions regarding distinct operation implementation

2022-12-04 Thread Ankit Kumar Pandey
On 04/12/22 02:27, David Rowley wrote: To make this work when rows can exit the window frame seems significantly harder. Likely a hash table would be a better data structure to remove records from, but then how are you going to spill the hash table to disk when it reaches work_mem? As David J

Re: Questions regarding distinct operation implementation

2022-12-03 Thread David Rowley
On Sun, 4 Dec 2022 at 08:57, Ankit Kumar Pandey wrote: > On 04/12/22 00:50, David Rowley wrote: >> providing you can code it in such a way that you only >> allocate one of these at once, i.e not allocate one per DISTINCT >> aggregate all at once. > > I am not sure if I understand this, does it

Re: Questions regarding distinct operation implementation

2022-12-03 Thread Ankit Kumar Pandey
On 04/12/22 00:50, David Rowley wrote: We do our best to ensure that a given executor node never uses more than work_mem. Certainly, we still do have nodes that can exceed this by a long way. It would be unlikely that we'd accept anything new that could do this. Makes sense, also would

Re: Questions regarding distinct operation implementation

2022-12-03 Thread David Rowley
On Sat, 3 Dec 2022 at 20:36, Ankit Kumar Pandey wrote: > Shouldn't this be an acceptable tradeoff if someone wants to perform > extra operation in plain old aggregates? Although I am not sure how much > this extra memory and compute usage is considered as acceptable. We do our best to ensure

Re: Questions regarding distinct operation implementation

2022-12-02 Thread Ankit Kumar Pandey
On 02/12/22 03:21, David G. Johnston wrote:  The main concern, I suspect, isn't implementation ability, it is speed and memory consumption. Hi David, Shouldn't this be an acceptable tradeoff if someone wants to perform extra operation in plain old aggregates? Although I am not sure how

Re: Questions regarding distinct operation implementation

2022-12-02 Thread Ankit Kumar Pandey
On 02/12/22 03:07, David Rowley wrote: On Fri, 2 Dec 2022 at 08:10, Ankit Kumar Pandey wrote: select avg(distinct id) over (partition by name) from mytable (in oracle db) yields: 2 2 2 2 10 From this, it is seen distinct is taken across the all rows in the partition. Due to the lack of

Re: Questions regarding distinct operation implementation

2022-12-02 Thread Ankit Kumar Pandey
On 02/12/22 00:40, Ankit Kumar Pandey wrote: On 25/11/22 11:00, Ankit Kumar Pandey wrote: On 25/11/22 02:14, David Rowley wrote: On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: Please let me know any opinions on this. I think if you're planning on working on this then step 1

Re: Questions regarding distinct operation implementation

2022-12-01 Thread David G. Johnston
On Thu, Dec 1, 2022 at 2:37 PM David Rowley wrote: > > The question is, what do you want to make work? If you're not worried > about supporting DISTINCT when there is an ORDER BY clause and the > frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING, then it's

Re: Questions regarding distinct operation implementation

2022-12-01 Thread David Rowley
On Fri, 2 Dec 2022 at 08:10, Ankit Kumar Pandey wrote: > select avg(distinct id) over (partition by name) from mytable (in oracle db) > yields: > 2 > 2 > 2 > 2 > 10 > > From this, it is seen distinct is taken across the all rows in the partition. Due to the lack of ORDER BY clause, all rows in

Re: Questions regarding distinct operation implementation

2022-12-01 Thread Ankit Kumar Pandey
On 25/11/22 11:00, Ankit Kumar Pandey wrote: On 25/11/22 02:14, David Rowley wrote: On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: Please let me know any opinions on this. I think if you're planning on working on this then step 1 would have to be checking the SQL standard to see

Re: Questions regarding distinct operation implementation

2022-11-24 Thread Ankit Kumar Pandey
On 25/11/22 02:14, David Rowley wrote: On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: Please let me know any opinions on this. I think if you're planning on working on this then step 1 would have to be checking the SQL standard to see which set of rows it asks implementations to

Re: Questions regarding distinct operation implementation

2022-11-24 Thread David Rowley
On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: > Please let me know any opinions on this. I think if you're planning on working on this then step 1 would have to be checking the SQL standard to see which set of rows it asks implementations to consider for duplicate checks when deciding

Re: Questions regarding distinct operation implementation

2022-11-24 Thread Ankit Kumar Pandey
On 23/11/22 23:48, Ankit Kumar Pandey wrote: Hello, I have questions regarding distinct operation and would be glad if someone could help me out. Consider the following table (mytable): id, name 1, A 1, A 2, B 3, A 1, A If we do /select avg(id) over (partition by name) from

Questions regarding distinct operation implementation

2022-11-23 Thread Ankit Kumar Pandey
Hello, I have questions regarding distinct operation and would be glad if someone could help me out. Consider the following table (mytable): id, name 1, A 1, A 2, B 3, A 1, A If we do /select avg(id) over (partition by name) from mytable/, partition logic goes like this: for A: 1,