On 7 June 2018 at 08:11, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > On 06/06/2018 04:11 PM, Andres Freund wrote: >> Consider e.g. a scheme where we'd switch from hashed aggregation to >> sorted aggregation due to memory limits, but already have a number of >> transition values in the hash table. Whenever the size of the transition >> values in the hashtable exceeds memory size, we write one of them to the >> tuplesort (with serialized transition value). From then on further input >> rows for that group would only be written to the tuplesort, as the group >> isn't present in the hashtable anymore. >> > > Ah, so you're suggesting that during the second pass we'd deserialize > the transition value and then add the tuples to it, instead of building > a new transition value. Got it.
Having to deserialize every time we add a new tuple sounds terrible from a performance point of view. Can't we just: 1. HashAgg until the hash table reaches work_mem. 2. Spill the entire table to disk. 3. Destroy the table and create a new one. 4. If more tuples: goto 1 5. Merge sort and combine each dumped set of tuples. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services