Re: Spilling hashed SetOps and aggregates to disk

David Rowley Wed, 06 Jun 2018 17:12:43 -0700

On 7 June 2018 at 08:11, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> On 06/06/2018 04:11 PM, Andres Freund wrote:
>> Consider e.g. a scheme where we'd switch from hashed aggregation to
>> sorted aggregation due to memory limits, but already have a number of
>> transition values in the hash table. Whenever the size of the transition
>> values in the hashtable exceeds memory size, we write one of them to the
>> tuplesort (with serialized transition value). From then on further input
>> rows for that group would only be written to the tuplesort, as the group
>> isn't present in the hashtable anymore.
>>
>
> Ah, so you're suggesting that during the second pass we'd deserialize
> the transition value and then add the tuples to it, instead of building
> a new transition value. Got it.


Having to deserialize every time we add a new tuple sounds terrible
from a performance point of view.

Can't we just:

1. HashAgg until the hash table reaches work_mem.
2. Spill the entire table to disk.
3. Destroy the table and create a new one.
4. If more tuples: goto 1
5. Merge sort and combine each dumped set of tuples.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Spilling hashed SetOps and aggregates to disk

Reply via email to