On 06/11/2018 08:13 PM, Jeff Davis wrote: > On Mon, 2018-06-11 at 19:33 +0200, Tomas Vondra wrote: >> For example we hit the work_mem limit after processing 10% tuples, >> switching to sort would mean spill+sort of 900GB of data. Or we >> might say - hmm, we're 10% through, so we expect hitting the limit >> 10x, so let's spill the hash table and then do sort on that, >> writing and sorting only 10GB of data. (Or merging it in some >> hash-based way, per Robert's earlier message.) > > Your example depends on large groups and a high degree of group > clustering. That's fine, but it's a special case, >
True, it's a special case and it won't work for other cases. It was merely an example for Andres. OTOH it's not entirely unrealistic, I think. Consider something like SELECT extract(year from ts) AS y, extract(month from ts) AS m, extract(day from ts) AS d, string_agg(x), array_agg(y) FROM fact_table GROUP BY y, m, d; which is likely very correlated (assuming new data is appended to the table), and the string_agg/array_agg are likely to produce fairly large groups (about proportional to the number of tuples in the group). Another example might be about HLL aggregate, although in that case the transition state does not grow, so it may not be that bad (and the default estimate of 1kB would work pretty nicely). But there certainly are other aggregates with large transition state, where this might not be the case, and we currently have no way to communicate that to the planner - except for setting work_mem much lower :-/ However, I now realize I've ignored the fact that we typically don't sort the whole table but only a very few columns, so the example was not entirely fair - we would not sort the whole remaining 900GB but likely much less. > and complexity does have a cost, too. Sure. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services