On Mon, 2020-05-25 at 04:10 +0200, Tomas Vondra wrote: > algorithm master prealloc tlist prealloc-tlist > -------------------------------------------------- > hash 1365 437 368 213 > sort 226 214 224 215 > > The sort row simply means "enable_hashagg = off" and AFAIK the > patches > should not have a lot of influence here - the prealloc does, but it's > fairly negligible.
I also say a small speedup from the prealloc patch for Sort. I wrote if off initially, but I'm wondering if there's something going on there. Perhaps drawing K elements from the minheap at once is better for caching? If so, that's good news, because it means the prealloc list is a win-win. > -> Finalize HashAggregate > Group Key: lineitem_1.l_partkey > -> Gather > Workers Planned: 2 > -> Partial HashAggregate > Group Key: > lineitem_1.l_partkey > -> Parallel Seq Scan on > lineitem lineitem_1 > (20 rows) Although each worker here only gets half the tuples, it will get (approximately) all of the *groups*. This may partly explain why the planner moves away from this plan when there are more workers: the number of hashagg batches doesn't go down much with more workers. It also might be interesting to know the estimate for the number of groups relative to the size of the table. If those two are close, it might look to the planner like processing the whole input in each worker isn't much worse than processing all of the groups in each worker. Regards, Jeff Davis