On Wed, Oct 20, 2010 at 9:53 PM, Nathan Boley <npbo...@gmail.com> wrote:
>> Robert explained why having more MCVs might be useful because we use
>> the frequency of the least common MCV as an upper bound on the
>> frequency of any value in the MCV.
>
> Where is that being used?

var_eq_const

> The only non-MCV frequency estimate that I
> recall seeing is ( nrows - n_ndistinct_rows  )/ndistinct. Obviously
> changing the number of mcv's affects this by lowering
> n_ndistinct_rows, but it's always pretty coarse estimate.

That one's used, too, but the other is used as an upper bound.
n_distinct tends to come out too small on large tables, so that
formula is prone to overestimation.  Actually, both formulas are prone
to overestimation.

>>  Binding the length of the MCV list to the size of the histogram is
>> arbitrary but so would any other value
>
> Wouldn't the best approach be to stop adding MCV's/histogram buckets
> when adding new ones doesn't decrease your prediction error
> 'substantially'?
>
> One very hacky threshold heuristic is to stop adding MCV's when a
> simple equality select (  SELECT col FROM table WHERE col == VALUE )
> would switch the plan from an index to a sequential scan ( or vice
> versa, although with the current code this would never happen ). ie,
> if the non_mcv frequency estimate is 0.1% ( producing an index scan ),

When this happens depends on the values of a whole boat-load of GUCs...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to