On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote: > Aside: you could argue that failing to consider stanullfrac is wrong, > and maybe it is. But the more I looked at this code the more > convinced I got that it was only partially accounting for nulls > anyway. That seems like perhaps something to look into later.
How about adjusting estfract for the null fraction before clamping?
```diff
@@ -4461,20 +4473,27 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node
*hashkey, double nbuckets,
/*
* Initial estimate of bucketsize fraction is 1/nbuckets as long as the
* number of buckets is less than the expected number of distinct values;
* otherwise it is 1/ndistinct.
*/
if (ndistinct > nbuckets)
estfract = 1.0 / nbuckets;
else
estfract = 1.0 / ndistinct;
+ /*
+ * Adjust for null fraction. NULL keys are not inserted into the hash
+ * table, but inner_path_rows in final_cost_hashjoin includes them, so we
+ * must discount estfract to compensate.
+ */
+ estfract *= (1.0 - stanullfrac);
+
/*
* Clamp the bucketsize fraction to be not less than the MCV frequency,
* since whichever bucket the MCV values end up in will have at least that
* size. This has no effect if *mcv_freq is still zero.
*/
estfract = Max(estfract, *mcv_freq);
*bucketsize_frac = (Selectivity) estfract;
ReleaseVariableStats(vardata);
```
Here is an extreme example where 98% of all hj_nullfrac_inner.val are NULL:
CREATE TABLE hj_nullfrac_inner (id int, val int);
INSERT INTO hj_nullfrac_inner
SELECT g, CASE WHEN g <= 1000 THEN g ELSE NULL END
FROM generate_series(1, 50000) g;
CREATE INDEX ON hj_nullfrac_inner(val);
CREATE TABLE hj_nullfrac_outer (id int, val int);
INSERT INTO hj_nullfrac_outer
SELECT g, ((g-1) % 1000)+1
FROM generate_series(1, 500000) g;
ANALYZE hj_nullfrac_inner;
ANALYZE hj_nullfrac_outer;
EXPLAIN ANALYZE
SELECT * FROM hj_nullfrac_outer o JOIN hj_nullfrac_inner i ON o.val = i.val;
Both master (HEAD) and v2 results in the same plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.30..20022.95 rows=499834 width=16) (actual
time=0.016..289.314 rows=500000.00 loops=1)
Buffers: shared hit=5212 read=1
-> Seq Scan on hj_nullfrac_outer o (cost=0.00..7213.00 rows=500000
width=8) (actual time=0.008..31.449 rows=500000.00 loops=1)
Buffers: shared hit=2213
-> Memoize (cost=0.30..0.32 rows=1 width=8) (actual time=0.000..0.000
rows=1.00 loops=500000)
Cache Key: o.val
Cache Mode: logical
Estimates: capacity=1000 distinct keys=1000 lookups=500000 hit
percent=99.80%
Hits: 499000 Misses: 1000 Evictions: 0 Overflows: 0 Memory Usage:
106kB
Buffers: shared hit=2999 read=1
-> Index Scan using hj_nullfrac_inner_val_idx on hj_nullfrac_inner i
(cost=0.29..0.31 rows=1 width=8) (actual time=0.002..0.002 rows=1.00 loops=1000)
Index Cond: (val = o.val)
Index Searches: 1000
Buffers: shared hit=2999 read=1
Planning:
Buffers: shared hit=87 read=7
Planning Time: 0.419 ms
Execution Time: 303.107 ms
(18 rows)
With v2+stanullfrac adjustment, we get a Hash Join, that is faster:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1347.00..15436.61 rows=500161 width=16) (actual
time=6.054..153.595 rows=500000.00 loops=1)
Hash Cond: (o.val = i.val)
Buffers: shared hit=2435
-> Seq Scan on hj_nullfrac_outer o (cost=0.00..7213.00 rows=500000
width=8) (actual time=0.008..31.597 rows=500000.00 loops=1)
Buffers: shared hit=2213
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=6.041..6.042
rows=1000.00 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 552kB
Buffers: shared hit=222
-> Seq Scan on hj_nullfrac_inner i (cost=0.00..722.00 rows=50000
width=8) (actual time=0.004..3.016 rows=50000.00 loops=1)
Buffers: shared hit=222
Planning:
Buffers: shared hit=6
Planning Time: 0.130 ms
Execution Time: 167.903 ms
(14 rows)
Here is elog debugging comparing v2 vs v2+stanullfrac adjustment, for
the above example:
v2:
ndistinct=1003.000000
nbuckets=65536.000000
stanullfrac=0.979933
mcv_freq=0.000020
estfract before clamping=0.000997
estfract after clamping=0.000997
v2+stanullfrac adjustment:
ndistinct=972.000000
nbuckets=65536.000000
stanullfrac=0.980567
mcv_freq=0.000020
estfract before stanullfrac adjustment=0.001029
estfract after stanullfrac adjustment=0.000020
estfract after clamping=0.000020
/Joel
fix-stanullfrac.patch
Description: Binary data
