Re: Default setting for enable_hashagg_disk

Andres Freund Thu, 25 Jun 2020 13:37:53 -0700

Hi,

On 2020-06-25 14:25:12 -0400, Bruce Momjian wrote:
> I am still trying to get my head around why the spill is going to be so
> much work to adjust for hash agg than our other spillable nodes.


Aggregates are the classical case used to process large amounts of
data. For larger amounts of data sorted input (be it via explicit sort
or ordered index scan) isn't an attractive option. IOW hash-agg is the
common case.  There's also fewer stats for halfway accurately estimating
the number of groups and the size of the transition state - a sort /
hash join doesn't have an equivalent to the variably sized transition
value.


> What are people doing for those cases already?  Do we have an
> real-world queries that are a problem in PG 13 for this?

I don't know about real world, but it's pretty easy to come up with
examples.

query:
SELECT a, array_agg(b) FROM (SELECT generate_series(1, 10000)) a(a), (SELECT 
generate_series(1, 10000)) b(b) GROUP BY a HAVING array_length(array_agg(b), 1) 
= 0;

work_mem = 4MB

12      18470.012 ms
HEAD    44635.210 ms

HEAD causes ~2.8GB of file IO, 12 doesn't cause any. If you're IO
bandwidth constrained, this could be quite bad.

Obviously this is contrived, and a pretty extreme case. But if you
imagine this happening on a system where disk IO isn't super fast
(e.g. just about any cloud provider).

An even more extreme version of the above is this:


query: SELECT a, array_agg(b) FROM (SELECT generate_series(1, 50000)) a(a), 
(SELECT generate_series(1, 10000)) b(b) GROUP BY a HAVING 
array_length(array_agg(b), 1) = 0;

work_mem = 16MB
12      81598.965 ms
HEAD    210772.360 ms

temporary tablespace on magnetic disk (raid 0 of two 7.2k server
spinning disks)

12      81136.530 ms
HEAD   225182.560 ms

The disks are busy in some periods, but still keep up. If I however make
the transition state a bit bigger:

query: SELECT a, array_agg(b), count(c), max(d),max(e) FROM (SELECT 
generate_series(1, 10000)) a(a), (SELECT generate_series(1, 5000)::text, 
repeat(random()::text, 10), repeat(random()::text, 10), repeat(random()::text, 
10)) b(b,c,d,e) GROUP BY a HAVING array_length(array_agg(b), 1) = 0;

12      28164.865 ms

fast ssd:
HEAD    92520.680 ms

magnetic:
HEAD    183968.538 ms

(no reads, there's plenty enough memory. Just writes because the age /
amount thresholds for dirty data are reached)

In the magnetic case we're IO bottlenecked nearly the whole time.


Just to be clear: I think this is a completely over-the-top example. But
I do think it shows the problem to some degree at least.

Greetings,

Andres Freund

Re: Default setting for enable_hashagg_disk

Reply via email to