Re: [HACKERS] Implementation of GROUPING SETS (T431: Extended grouping capabilities)

Robert Haas Wed, 13 May 2009 00:37:27 -0700

On Tue, May 12, 2009 at 2:21 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>> Moreover, I guess you don't even need to buffer tuples to aggregate by
>> different keys. What you have to do is only to prepare more than one
>> hash tables (, or set up sort order if the plan detects hash table is
>> too large to fit in the memory), and one time seq scan will do. The
>> trans values are only to be stored in the memory, not the outer plan's
>> results. It will win greately in performance.
>
> it was my first solution. But I would to prepare one non hash method.
> But now I thinking about some special executor node, that fill all
> necessary hash parallel. It's special variant of hash agreggate.


I think HashAggregate will often be the fastest method of executing
this kind of operation, but it would be nice to have an alternative
(such as repeatedly sorting a tuplestore) to handle non-hashable
datatypes or cases where the HashAggregate would eat too much memory.

But that leads me to a question - does the existing HashAggregate code
make any attempt to obey work_mem?  I know that the infrastructure is
present for HashJoin/Hash, but on a quick pass I didn't notice
anything similar in HashAggregate.

And on a slightly off-topic note for this thread, is there any
compelling reason why we have at least three different hash
implementations in the executor?  HashJoin/Hash uses one for regular
batches and one for the skew batch, and I believe that HashAggregate
does something else entirely.  It seems like it might improve code
maintainability, if nothing else, to unify these to the extent
possible.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implementation of GROUPING SETS (T431: Extended grouping capabilities)

Reply via email to