On Thu, 19 Feb 2026 at 17:33, Tomas Vondra <[email protected]> wrote:
> > I think the correct fix would be to have a way to insert into
> > simplehash with a limit on size, which means that the insert might
> > fail. I haven't yet looked at how complicated this would be to
> > implement.
> >
>
> Wouldn't it be easier to just start ignoring SH_GROW_MAX_MOVE? That'd
> have a little bit of performance impact on that one key, but that seems
> acceptable. And easier to do than dealing with failing inserts.

I am not sure if that is a 100% fix, but it would definitely be a
whole lot better than the current behavior. I'm not sure if it's
guaranteed that the two other memory contexts end up being more than
the size of the hash table. If they aren't then the hashtable could
still grow too big once a hashtable bigger than half the allowed size
hits grow_threshold.

I am more worried about having a huge run of tuples in the hash table,
which will happen if inserts are correlated by hash id. That will also
cause quadratic slowdown.

> > I also haven't checked what is the cause for such a long run of
> > collisions. But I think it's related to it being a HashAggregate on
> > top of Gather on top of HashAggregate.
> >
>
> So it's a parallel aggregate? Partial + Finalize? I wonder if that might
> be "correlating" the data in a way that makes it more likely to hit
> SH_GROW_MAX_MOVE. But If that was the case, wouldn't we see this issue
> more often?

Interestingly the plan doesn't have partial and final on those hash agg nodes:

                     ->  HashAggregate  (cost=142400.87..142800.87
rows=40000 width=16) (actual time=7978.262..9591.682 rows=3698243
loops=1)
                           Group Key: "*SELECT* 2_4".vehicle_id,
"*SELECT* 2_4".day
                           Batches: 21  Memory Usage: 65593kB  Disk
Usage: 118256kB
                           ->  Gather  (cost=133600.87..142000.87
rows=80000 width=16) (actual time=1898.473..4772.296 rows=3698243
loops=1)
                                 Workers Planned: 2
                                 Workers Launched: 2
                                 ->  HashAggregate
(cost=132600.87..133000.87 rows=40000 width=16) (actual
time=1586.697..2040.368 rows=1232748 loops=3)
                                       Group Key: "*SELECT*
2_4".vehicle_id, "*SELECT* 2_4".day
                                       Batches: 1  Memory Usage: 5137kB
                                       Worker 0:  Batches: 5  Memory
Usage: 79921kB  Disk Usage: 40024kB
                                       Worker 1:  Batches: 5  Memory
Usage: 81969kB  Disk Usage: 36112kB

There are timescale tables involved in the plan, so I think timescale
might be behind that.

There is this comment above the simplehash growing logic:

* To avoid negative consequences from overly imbalanced
* hashtables, grow the hashtable if collisions would require
* us to move a lot of entries.  The most likely cause of such
* imbalance is filling a (currently) small table, from a
* currently big one, in hash-table order.

The problem disappears if I have a breakpoint on tuplehash_grow, so
apparently triggering the problem requires that the lower hashtable
scans interleave in a particular manner to trigger the excess growth
of the upper node.

I'm wondering if some way to decorrelate the hashtables would help.
For example a hashtable specific (pseudo)random salt.

Regards,
Ants Aasma


Reply via email to