Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Tender Wang Fri, 27 Feb 2026 15:57:48 -0800

Tom Lane <[email protected]> 于2026年2月28日周六 03:15写道：
>
> Tender Wang <[email protected]> writes:
> > I added Tom to the cc list. He may know more about this.
>
> Hmm, git blame says I originated this function 25 years ago
> (f905d65ee), but I don't claim to remember that.
>
> Looking at it now, though, I think that bd3e3e9e5 is indeed
> wrong but not in the way Joel suggests.  The longstanding
> way to compute mcv_freq is
>
>             /*
>              * The first MCV stat is for the most common value.
>              */
>             if (sslot.nnumbers > 0)
>                 *mcv_freq = sslot.numbers[0];
>
> *This number is a fraction measured on the raw relation.*
> (Necessarily so, because it's just a number computed by ANALYZE.)
> Then bd3e3e9e5 added
>
>             /*
>              * If there are no recorded MCVs, but we do have a histogram, then
>              * assume that ANALYZE determined that the column is unique.
>              */
>             if (vardata.rel && vardata.rel->rows > 0)
>                 *mcv_freq = 1.0 / vardata.rel->rows;
>
> This is a pure thinko.  rel->rows is the estimated number
> of filtered rows.  What I should have used is rel->tuples,
> which is the estimated raw relation size, so as to get a
> number that is commensurate with the longstanding way
> of calculating mcv_freq.  Then that also matches up with
> computing avgfreq on the raw relation.
>
> So I think the correct fix is basically
>
> -            if (vardata.rel && vardata.rel->rows > 0)
> -                *mcv_freq = 1.0 / vardata.rel->rows;
> +            if (vardata.rel && vardata.rel->tuples > 0)
> +                *mcv_freq = 1.0 / vardata.rel->tuples;
>


Yeah, in my last email, I said I tried this way. But I worried that
rel->tuples may be zero for an empty relation.

> and I wonder if that will wind up in reverting a lot of the plan
> choice changes seen in bd3e3e9e5.

Yes, a lot plan diff in partition_join.sql.



-- 
Thanks,
Tender Wang

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Reply via email to