On Apr 1, 2016 23:14, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > Alright. I'm attaching the latest version of this patch split in two
> > parts: the first one is NULLs-related bugfix and the second is the
> > "improvement" part, which applies on top of the first one.
> I've applied the first of these patches,
Great news, thank you!
> broken into two parts first
> because it seemed like there were two issues and second because Tomas
> deserved primary credit for one part, ie realizing we were using the
> Haas-Stokes formula wrong.
> As for the other part, I committed it with one non-cosmetic change:
> I do not think it is right to omit "too wide" values when considering
> the threshold for MCVs. As submitted, the patch was inconsistent on
> that point anyway since it did it differently in compute_distinct_stats
> and compute_scalar_stats. But the larger picture here is that we define
> the MCV population to exclude nulls, so it's reasonable to consider a
> value as an MCV even if it's greatly outnumbered by nulls. There is
> no such exclusion for "too wide" values; those things are just an
> implementation limitation in analyze.c, not something that is part of
> the pg_statistic definition. If there are a lot of "too wide" values
> in the sample, we don't know whether any of them are duplicates, but
> we do know that the frequencies of the normal-width values have to be
> discounted appropriately.
> Haven't looked at 0002 yet.
[crosses fingers] hope you'll have a chance to do that before feature
freeze for 9.6…