Re: Spilling hashed SetOps and aggregates to disk

Tomas Vondra Thu, 07 Jun 2018 01:21:52 -0700

On 06/07/2018 02:18 AM, Andres Freund wrote:

On 2018-06-06 17:17:52 -0700, Andres Freund wrote:

On 2018-06-07 12:11:37 +1200, David Rowley wrote:

On 7 June 2018 at 08:11, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:

On 06/06/2018 04:11 PM, Andres Freund wrote:

Consider e.g. a scheme where we'd switch from hashed aggregation to
sorted aggregation due to memory limits, but already have a number of
transition values in the hash table. Whenever the size of the transition
values in the hashtable exceeds memory size, we write one of them to the
tuplesort (with serialized transition value). From then on further input
rows for that group would only be written to the tuplesort, as the group
isn't present in the hashtable anymore.


Ah, so you're suggesting that during the second pass we'd deserialize
the transition value and then add the tuples to it, instead of building
a new transition value. Got it.


Having to deserialize every time we add a new tuple sounds terrible
from a performance point of view.


I didn't mean that we do that, and I don't think David understood it as
that either. I was talking about the approach where the second pass is a
sort rather than hash based aggregation.  Then we would *not* need to
deserialize more than exactly once.


s/David/Tomas/, obviously. Sorry, it's been a long day.


Solution is simple: drink more coffee. ;-)

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Spilling hashed SetOps and aggregates to disk

Reply via email to