Thank you everyone for the reviews, feedback and the test.

Please find attached a new version of the patch.

On Tue, Sep 23, 2025 at 7:11 AM Chao Li <li.evan.c...@gmail.com> wrote:

>
>
> On Sep 23, 2025, at 07:35, David Rowley <dgrowle...@gmail.com> wrote:
>
> On Tue, 23 Sept 2025 at 11:21, Chao Li <li.evan.c...@gmail.com> wrote:
>
> I guess that because earlier in the function, nbatch is always clamped
> with:
>
> nbatch = pg_nextpower2_32(Max(2, minbatch));
>
>
> I don't follow which part of that line could be constituted as
> clamping. Maybe you've confused Max with Min?
>
> David
>
>
> Sorry for the misleading. I actually meant “minbatch”.
>
> I remember I ever traced the function several times. First, with a normal
> (not much data involved) query,
>
> if (inner_rel_bytes + bucket_bytes > hash_table_bytes)
> {
>
> Is hard to meet, then nbatch will be just 1.
>
> With big data involved, it will enter the “if” clause, but minbatch is
> also hard to go very high.
>
> To clarify, I just created a test:
>
> ```
> evantest=# SET enable_nestloop = off;
> SET
> evantest=# SET enable_mergejoin = off;
> SET
> evantest=# SET enable_hashjoin = on;
> SET
> evantest=# CREATE TEMP TABLE inner_tbl AS
> evantest-# SELECT g AS id, repeat('x', 2000) AS filler
> evantest-# FROM generate_series(1, 200000) g;
> SELECT 200000
> evantest=# CREATE TEMP TABLE outer_tbl AS
> evantest-# SELECT g AS id FROM generate_series(1, 1000000) g;
> SELECT 1000000
> evantest=#
> evantest=#
> evantest=# EXPLAIN ANALYZE
> evantest-# SELECT *
> evantest-# FROM outer_tbl o
> evantest-# JOIN inner_tbl i
> evantest-#   ON o.id = i.id;
>                                                               QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=34647.00..1270978635.52 rows=36306020352 width=40)
> (actual time=353.908..1355.735 rows=200000.00 loops=1)
>    Hash Cond: (i.id = o.id)
>    Buffers: local read=54528 dirtied=54425 written=54418, temp read=45853
> written=45853
>    ->  Seq Scan on inner_tbl i  (cost=0.00..113608.96 rows=6356096
> width=36) (actual time=1.132..460.711 rows=200000.00 loops=1)
>          Buffers: local read=50048 dirtied=50000 written=49993
>    ->  Hash  (cost=15904.00..15904.00 rows=1142400 width=4) (actual
> time=351.280..351.282 rows=1000000.00 loops=1)
>          Buckets: 262144  Batches: 8  Memory Usage: 6446kB
>          Buffers: local read=4480 dirtied=4425 written=4425, temp
> written=2560
>          ->  Seq Scan on outer_tbl o  (cost=0.00..15904.00 rows=1142400
> width=4) (actual time=0.760..162.229 rows=1000000.00 loops=1)
>                Buffers: local read=4480 dirtied=4425 written=4425
>  Planning:
>    Buffers: shared hit=14
>  Planning Time: 389649.420 ms
>  Execution Time: 1362.392 ms
> (14 rows)
> ```
>
> In this test, minbatch is just 64.
>
> But I agree, I did never test with large amount of data. I don’t actually
> know how much data can make nbatch to reach to ~130K (the value will lead
> to overflow if nbatch is of int type).
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
>

Attachment: 0002-Fix-overflow-of-nbatch.patch
Description: Binary data

Reply via email to