On Wed, Oct 20, 2010 at 6:03 PM, Josh Berkus <j...@agliodbs.com> wrote:
> I also just realized that I confused myself ... we don't really want
> more MCVs.  What we want it more *samples* to derive a small number of
> MCVs.  Right now # of samples and number of MCVs is inexorably bound,
> and they shouldn't be.  On larger tables, you're correct that we don't
> necessarily want more MCVs, we just need more samples to figure out
> those MCVs accurately.

I don't see why the MCVs would need a particularly large sample size
to calculate accurately. Have you done any tests on the accuracy of
the MCV list?

Robert explained why having more MCVs might be useful because we use
the frequency of the least common MCV as an upper bound on the
frequency of any value in the MCV. That seems logical but it's all
about the number of MCV entries not the accuracy of them. And mostly
what it tells me is that we need a robust statistical method and the
data structures it requires for estimating the frequency of a single
value.

 Binding the length of the MCV list to the size of the histogram is
arbitrary but so would any other value and I haven't seen anyone
propose any rationale for any particular value. The only rationale I
can see is that we probably want to to take roughly the same amount of
space as the existing stats -- and that means we probably want it to
be roughly the same size.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to