Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Tom Lane Sun, 01 Mar 2026 13:12:48 -0800

"Joel Jacobson" <[email protected]> writes:
> On Sun, Mar 1, 2026, at 01:08, Tom Lane wrote:
>> Anyway, we all seem to agree that s/rel->rows/rel->tuples/ is the
>> correct fix for a newly-introduced bug.  I'm inclined to proceed
>> by committing that fix (along with any regression test fallout)
>> and then investigating the avgfreq change as an independent matter.


> Yes, it seems fine to do them as separate fixes. The argument against
> would be if the two separate bugs would somehow compensate for each
> other, but I think they compound.

Did that.  I was interested to see that this resulted in reverting
every single one of the plan changes induced by bd3e3e9e5, except the
one that involved applying a hash join despite having no statistics.
(That can be blamed on final_cost_hashjoin failing to disregard a
zero mcv_freq value, and really has nothing to do with
estimate_hash_bucket_stats at all.)

> Assuming unique column, 100k rows, no nulls, 1% selectivity (rows=1000):
>
> Master (both bugs):  estfract = 100000/1000² = 0.1      (10000x too high)
> Only mcv_freq fix:   estfract = 1/1000       = 0.001    (100x too high)
> Both fixes:          estfract = 1/100000     = 0.00001  (correct)

I don't buy that that's correct.  For a unique column, the result
should be basically 1/nbuckets or 1/ndistinct, whichever is larger;
in this case it's probably bounded by ndistinct=1000, so that 0.001
is the right answer.

Nonetheless, it's inarguable that the code's doing the wrong thing
with the examples you presented upthread.  After staring at it for
a long while I noticed that with your proposed patch, supposing
that stanullfrac = 0 and ndistinct is small, we compute both avgfreq
and the initial estfract as 1 over (corrected) ndistinct.  So the
adjustment corresponds to just setting estfract equal to mcv_freq
in this case.  However, if nbuckets decreases below ndistinct,
estfract rises and then we're scaling it up by some amount.  That
doesn't make a lot of sense: decreasing the number of buckets
doesn't change the number of MCV values.  Also, increasing stanullfrac
results in decreasing avgfreq and therefore making the correction
larger, which doesn't seem to make sense either.

What is more sensible I think is just to clamp estfract to be at least
mcv_freq always, dispensing with a whole bunch of the complication
here.  We actually don't need avgfreq at all, and therefore not
stanullfrac, and also the ad-hoc range clamp at the bottom no longer
seems necessary.  Interestingly, this also makes the logic more
nearly like the "isdefault" early exit:

        *bucketsize_frac = (Selectivity) Max(0.1, *mcv_freq);

which IIRC was added a lot later than the original logic.

So I end with the attached draft patch.  Very interestingly,
neither this version nor your lets-calculate-avgfreq-later
patch change any regression tests at all compared to git HEAD.
Maybe we should try to add a case that does change.

Aside: you could argue that failing to consider stanullfrac is wrong,
and maybe it is.  But the more I looked at this code the more
convinced I got that it was only partially accounting for nulls
anyway.  That seems like perhaps something to look into later.

                        regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index bf6bb624942..9f5178620ef 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4373,8 +4373,8 @@ estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
  * exactly those that will be probed most often.  Therefore, the "average"
  * bucket size for costing purposes should really be taken as something close
  * to the "worst case" bucket size.  We try to estimate this by adjusting the
- * fraction if there are too few distinct data values, and then scaling up
- * by the ratio of the most common value's frequency to the average frequency.
+ * fraction if there are too few distinct data values, and then clamping to
+ * at least the bucket size implied by the most common value's frequency.
  *
  * If no statistics are available, use a default estimate of 0.1.  This will
  * discourage use of a hash rather strongly if the inner relation is large,
@@ -4394,9 +4394,7 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node *hashkey, double nbuckets,
 {
 	VariableStatData vardata;
 	double		estfract,
-				ndistinct,
-				stanullfrac,
-				avgfreq;
+				ndistinct;
 	bool		isdefault;
 	AttStatsSlot sslot;
 
@@ -4446,20 +4444,6 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node *hashkey, double nbuckets,
 		return;
 	}
 
-	/* Get fraction that are null */
-	if (HeapTupleIsValid(vardata.statsTuple))
-	{
-		Form_pg_statistic stats;
-
-		stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
-		stanullfrac = stats->stanullfrac;
-	}
-	else
-		stanullfrac = 0.0;
-
-	/* Compute avg freq of all distinct data values in raw relation */
-	avgfreq = (1.0 - stanullfrac) / ndistinct;
-
 	/*
 	 * Adjust ndistinct to account for restriction clauses.  Observe we are
 	 * assuming that the data distribution is affected uniformly by the
@@ -4485,20 +4469,11 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node *hashkey, double nbuckets,
 		estfract = 1.0 / ndistinct;
 
 	/*
-	 * Adjust estimated bucketsize upward to account for skewed distribution.
-	 */
-	if (avgfreq > 0.0 && *mcv_freq > avgfreq)
-		estfract *= *mcv_freq / avgfreq;
-
-	/*
-	 * Clamp bucketsize to sane range (the above adjustment could easily
-	 * produce an out-of-range result).  We set the lower bound a little above
-	 * zero, since zero isn't a very sane result.
+	 * Clamp the bucketsize fraction to be not less than the MCV frequency,
+	 * since whichever bucket the MCV values end up in will have at least that
+	 * size.  This has no effect if *mcv_freq is still zero.
 	 */
-	if (estfract < 1.0e-6)
-		estfract = 1.0e-6;
-	else if (estfract > 1.0)
-		estfract = 1.0;
+	estfract = Max(estfract, *mcv_freq);
 
 	*bucketsize_frac = (Selectivity) estfract;

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Reply via email to