Re: Hash aggregate collisions cause excessive spilling

Tomas Vondra Thu, 19 Feb 2026 07:33:46 -0800

On 2/19/26 15:49, Ants Aasma wrote:
> Investigating a customer complaint I ran into an issue with the hash
> aggregate code. The problem was that a query that usually completes in
> less than a minute sometimes gets stuck indefinitely (hours+). I
> tracked it down to a hash aggregate node returning one tuple from a
> batch and spilling the rest.
> 
> The reason for the behavior is that aggstate->hash_metacxt was 100M,
> which is larger than work_mem*hash_mem_multiplier of 64M. This makes
> hash_agg_check_limits() always spill after the first tuple. I think
> that ends up having a n² overhead, with n being almost 4M here.
>


Not great :-(

> I don't have a simple reproducer yet, because the live problem was on
> a parallel query where looking at the backend wrong caused the problem
> to disappear. After some retries I was able to catch an instance of
> growing past work_mem with gdb. After that growth the simplehash was
> {size = 4194304, members = 409839, ..}, i.e. the table was only 20%
> full before growing. So the cause seems to be a run of hash collisions
> bigger than SH_GROW_MAX_MOVE (150).
> 
> AFAICT there is nothing in simplehash that would stop it growing past
> work_mem, and once it does the spilling logic in
> agg_refill_hash_table() enters this degenerate state until the end of
> the plan node.
> 
> I think the correct fix would be to have a way to insert into
> simplehash with a limit on size, which means that the insert might
> fail. I haven't yet looked at how complicated this would be to
> implement.
> 

Wouldn't it be easier to just start ignoring SH_GROW_MAX_MOVE? That'd
have a little bit of performance impact on that one key, but that seems
acceptable. And easier to do than dealing with failing inserts.

> I also haven't checked what is the cause for such a long run of
> collisions. But I think it's related to it being a HashAggregate on
> top of Gather on top of HashAggregate.
> 

So it's a parallel aggregate? Partial + Finalize? I wonder if that might
be "correlating" the data in a way that makes it more likely to hit
SH_GROW_MAX_MOVE. But If that was the case, wouldn't we see this issue
more often?


regards

-- 
Tomas Vondra

Re: Hash aggregate collisions cause excessive spilling

Reply via email to