Tom Lane <[email protected]> 于2026年2月28日周六 03:15写道: > > Tender Wang <[email protected]> writes: > > I added Tom to the cc list. He may know more about this. > > Hmm, git blame says I originated this function 25 years ago > (f905d65ee), but I don't claim to remember that. > > Looking at it now, though, I think that bd3e3e9e5 is indeed > wrong but not in the way Joel suggests. The longstanding > way to compute mcv_freq is > > /* > * The first MCV stat is for the most common value. > */ > if (sslot.nnumbers > 0) > *mcv_freq = sslot.numbers[0]; > > *This number is a fraction measured on the raw relation.* > (Necessarily so, because it's just a number computed by ANALYZE.) > Then bd3e3e9e5 added > > /* > * If there are no recorded MCVs, but we do have a histogram, then > * assume that ANALYZE determined that the column is unique. > */ > if (vardata.rel && vardata.rel->rows > 0) > *mcv_freq = 1.0 / vardata.rel->rows; > > This is a pure thinko. rel->rows is the estimated number > of filtered rows. What I should have used is rel->tuples, > which is the estimated raw relation size, so as to get a > number that is commensurate with the longstanding way > of calculating mcv_freq. Then that also matches up with > computing avgfreq on the raw relation. > > So I think the correct fix is basically > > - if (vardata.rel && vardata.rel->rows > 0) > - *mcv_freq = 1.0 / vardata.rel->rows; > + if (vardata.rel && vardata.rel->tuples > 0) > + *mcv_freq = 1.0 / vardata.rel->tuples; >
Yeah, in my last email, I said I tried this way. But I worried that rel->tuples may be zero for an empty relation. > and I wonder if that will wind up in reverting a lot of the plan > choice changes seen in bd3e3e9e5. Yes, a lot plan diff in partition_join.sql. -- Thanks, Tender Wang
