Re: Improvement of var_eq_non_const()

Ilia Evdokimov Fri, 04 Apr 2025 11:29:13 -0700


On 20.02.2025 21:21, Tom Lane wrote:

Teodor Sigaev <teo...@sigaev.ru> writes:

I'd like to suggest to improve var_eq_non_const() by using knowledge of MCV and
estimate the selectivity as quadratic mean of non-null fraction divided by
number of distinct values (as it was before) and set of MCV selectivities.

What's the statistical interpretation of this calculation (that is,
the average MCV selectivity)?  Maybe it's better, but without any
context it seems like a pretty random thing to do.  In particular,
it seems like this could give radically different answers depending
on how many MCVs we chose to store, and I'm not sure we could argue
that the result gets more accurate with more MCVs stored.

                        regards, tom lane

Hi,

The arithmetic mean is not exactly the same as the root mean squareapproach implemented by Teodor. The key difference is that the root meansquare is more influenced by the largest values in the distribution. Thefurther the data deviates from a uniform distribution, the lessrepresentative a simple arithmetic mean becomes.

Theodor's idea seems quite useful to me because it ensures thatselectivity is now influenced by multiple significant values from theMCV list, rather than just the single most frequent one. This shouldlead to a more accurate selectivity estimate, better reflecting theactual data distribution.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Improvement of var_eq_non_const()

Reply via email to