On Wed, Oct 20, 2010 at 9:53 PM, Nathan Boley <npbo...@gmail.com> wrote: >> Robert explained why having more MCVs might be useful because we use >> the frequency of the least common MCV as an upper bound on the >> frequency of any value in the MCV. > > Where is that being used?
var_eq_const > The only non-MCV frequency estimate that I > recall seeing is ( nrows - n_ndistinct_rows )/ndistinct. Obviously > changing the number of mcv's affects this by lowering > n_ndistinct_rows, but it's always pretty coarse estimate. That one's used, too, but the other is used as an upper bound. n_distinct tends to come out too small on large tables, so that formula is prone to overestimation. Actually, both formulas are prone to overestimation. >> Binding the length of the MCV list to the size of the histogram is >> arbitrary but so would any other value > > Wouldn't the best approach be to stop adding MCV's/histogram buckets > when adding new ones doesn't decrease your prediction error > 'substantially'? > > One very hacky threshold heuristic is to stop adding MCV's when a > simple equality select ( SELECT col FROM table WHERE col == VALUE ) > would switch the plan from an index to a sequential scan ( or vice > versa, although with the current code this would never happen ). ie, > if the non_mcv frequency estimate is 0.1% ( producing an index scan ), When this happens depends on the values of a whole boat-load of GUCs... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers