Re: Spilling hashed SetOps and aggregates to disk

Tomas Vondra Mon, 11 Jun 2018 11:52:40 -0700

On 06/11/2018 08:13 PM, Jeff Davis wrote:
> On Mon, 2018-06-11 at 19:33 +0200, Tomas Vondra wrote:
>> For example we hit the work_mem limit after processing 10% tuples,
>>  switching to sort would mean spill+sort of 900GB of data. Or we 
>> might say - hmm, we're 10% through, so we expect hitting the limit
>> 10x, so let's spill the hash table and then do sort on that,
>> writing and sorting only 10GB of data. (Or merging it in some
>> hash-based way, per Robert's earlier message.)
> 
> Your example depends on large groups and a high degree of group
> clustering. That's fine, but it's a special case,
>


True, it's a special case and it won't work for other cases. It was
merely an example for Andres.

OTOH it's not entirely unrealistic, I think. Consider something like

  SELECT
    extract(year from ts) AS y,
    extract(month from ts) AS m,
    extract(day from ts) AS d,
    string_agg(x),
    array_agg(y)
  FROM fact_table
  GROUP BY y, m, d;

which is likely very correlated (assuming new data is appended to the
table), and the string_agg/array_agg are likely to produce fairly large
groups (about proportional to the number of tuples in the group).

Another example might be about HLL aggregate, although in that case the
transition state does not grow, so it may not be that bad (and the
default estimate of 1kB would work pretty nicely). But there certainly
are other aggregates with large transition state, where this might not
be the case, and we currently have no way to communicate that to the
planner - except for setting work_mem much lower :-/

However, I now realize I've ignored the fact that we typically don't
sort the whole table but only a very few columns, so the example was not
entirely fair - we would not sort the whole remaining 900GB but likely
much less.

> and complexity does have a cost, too.

Sure.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Spilling hashed SetOps and aggregates to disk

Reply via email to