Thank you everyone for the reviews, feedback and the test. Please find attached a new version of the patch.
On Tue, Sep 23, 2025 at 7:11 AM Chao Li <li.evan.c...@gmail.com> wrote: > > > On Sep 23, 2025, at 07:35, David Rowley <dgrowle...@gmail.com> wrote: > > On Tue, 23 Sept 2025 at 11:21, Chao Li <li.evan.c...@gmail.com> wrote: > > I guess that because earlier in the function, nbatch is always clamped > with: > > nbatch = pg_nextpower2_32(Max(2, minbatch)); > > > I don't follow which part of that line could be constituted as > clamping. Maybe you've confused Max with Min? > > David > > > Sorry for the misleading. I actually meant “minbatch”. > > I remember I ever traced the function several times. First, with a normal > (not much data involved) query, > > if (inner_rel_bytes + bucket_bytes > hash_table_bytes) > { > > Is hard to meet, then nbatch will be just 1. > > With big data involved, it will enter the “if” clause, but minbatch is > also hard to go very high. > > To clarify, I just created a test: > > ``` > evantest=# SET enable_nestloop = off; > SET > evantest=# SET enable_mergejoin = off; > SET > evantest=# SET enable_hashjoin = on; > SET > evantest=# CREATE TEMP TABLE inner_tbl AS > evantest-# SELECT g AS id, repeat('x', 2000) AS filler > evantest-# FROM generate_series(1, 200000) g; > SELECT 200000 > evantest=# CREATE TEMP TABLE outer_tbl AS > evantest-# SELECT g AS id FROM generate_series(1, 1000000) g; > SELECT 1000000 > evantest=# > evantest=# > evantest=# EXPLAIN ANALYZE > evantest-# SELECT * > evantest-# FROM outer_tbl o > evantest-# JOIN inner_tbl i > evantest-# ON o.id = i.id; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=34647.00..1270978635.52 rows=36306020352 width=40) > (actual time=353.908..1355.735 rows=200000.00 loops=1) > Hash Cond: (i.id = o.id) > Buffers: local read=54528 dirtied=54425 written=54418, temp read=45853 > written=45853 > -> Seq Scan on inner_tbl i (cost=0.00..113608.96 rows=6356096 > width=36) (actual time=1.132..460.711 rows=200000.00 loops=1) > Buffers: local read=50048 dirtied=50000 written=49993 > -> Hash (cost=15904.00..15904.00 rows=1142400 width=4) (actual > time=351.280..351.282 rows=1000000.00 loops=1) > Buckets: 262144 Batches: 8 Memory Usage: 6446kB > Buffers: local read=4480 dirtied=4425 written=4425, temp > written=2560 > -> Seq Scan on outer_tbl o (cost=0.00..15904.00 rows=1142400 > width=4) (actual time=0.760..162.229 rows=1000000.00 loops=1) > Buffers: local read=4480 dirtied=4425 written=4425 > Planning: > Buffers: shared hit=14 > Planning Time: 389649.420 ms > Execution Time: 1362.392 ms > (14 rows) > ``` > > In this test, minbatch is just 64. > > But I agree, I did never test with large amount of data. I don’t actually > know how much data can make nbatch to reach to ~130K (the value will lead > to overflow if nbatch is of int type). > > Best regards, > -- > Chao Li (Evan) > HighGo Software Co., Ltd. > https://www.highgo.com/ > > > > >
0002-Fix-overflow-of-nbatch.patch
Description: Binary data