Hi, On 2020-06-25 14:25:12 -0400, Bruce Momjian wrote: > I am still trying to get my head around why the spill is going to be so > much work to adjust for hash agg than our other spillable nodes.
Aggregates are the classical case used to process large amounts of data. For larger amounts of data sorted input (be it via explicit sort or ordered index scan) isn't an attractive option. IOW hash-agg is the common case. There's also fewer stats for halfway accurately estimating the number of groups and the size of the transition state - a sort / hash join doesn't have an equivalent to the variably sized transition value. > What are people doing for those cases already? Do we have an > real-world queries that are a problem in PG 13 for this? I don't know about real world, but it's pretty easy to come up with examples. query: SELECT a, array_agg(b) FROM (SELECT generate_series(1, 10000)) a(a), (SELECT generate_series(1, 10000)) b(b) GROUP BY a HAVING array_length(array_agg(b), 1) = 0; work_mem = 4MB 12 18470.012 ms HEAD 44635.210 ms HEAD causes ~2.8GB of file IO, 12 doesn't cause any. If you're IO bandwidth constrained, this could be quite bad. Obviously this is contrived, and a pretty extreme case. But if you imagine this happening on a system where disk IO isn't super fast (e.g. just about any cloud provider). An even more extreme version of the above is this: query: SELECT a, array_agg(b) FROM (SELECT generate_series(1, 50000)) a(a), (SELECT generate_series(1, 10000)) b(b) GROUP BY a HAVING array_length(array_agg(b), 1) = 0; work_mem = 16MB 12 81598.965 ms HEAD 210772.360 ms temporary tablespace on magnetic disk (raid 0 of two 7.2k server spinning disks) 12 81136.530 ms HEAD 225182.560 ms The disks are busy in some periods, but still keep up. If I however make the transition state a bit bigger: query: SELECT a, array_agg(b), count(c), max(d),max(e) FROM (SELECT generate_series(1, 10000)) a(a), (SELECT generate_series(1, 5000)::text, repeat(random()::text, 10), repeat(random()::text, 10), repeat(random()::text, 10)) b(b,c,d,e) GROUP BY a HAVING array_length(array_agg(b), 1) = 0; 12 28164.865 ms fast ssd: HEAD 92520.680 ms magnetic: HEAD 183968.538 ms (no reads, there's plenty enough memory. Just writes because the age / amount thresholds for dirty data are reached) In the magnetic case we're IO bottlenecked nearly the whole time. Just to be clear: I think this is a completely over-the-top example. But I do think it shows the problem to some degree at least. Greetings, Andres Freund