"Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> Alright.  I'm attaching the latest version of this patch split in two
> parts: the first one is NULLs-related bugfix and the second is the
> "improvement" part, which applies on top of the first one.

I've applied the first of these patches, broken into two parts first
because it seemed like there were two issues and second because Tomas
deserved primary credit for one part, ie realizing we were using the
Haas-Stokes formula wrong.

As for the other part, I committed it with one non-cosmetic change:
I do not think it is right to omit "too wide" values when considering
the threshold for MCVs.  As submitted, the patch was inconsistent on
that point anyway since it did it differently in compute_distinct_stats
and compute_scalar_stats.  But the larger picture here is that we define
the MCV population to exclude nulls, so it's reasonable to consider a
value as an MCV even if it's greatly outnumbered by nulls.  There is
no such exclusion for "too wide" values; those things are just an
implementation limitation in analyze.c, not something that is part of
the pg_statistic definition.  If there are a lot of "too wide" values
in the sample, we don't know whether any of them are duplicates, but
we do know that the frequencies of the normal-width values have to be
discounted appropriately.

Haven't looked at 0002 yet.

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to