> Robert explained why having more MCVs might be useful because we use > the frequency of the least common MCV as an upper bound on the > frequency of any value in the MCV.
Where is that being used? The only non-MCV frequency estimate that I recall seeing is ( nrows - n_ndistinct_rows )/ndistinct. Obviously changing the number of mcv's affects this by lowering n_ndistinct_rows, but it's always pretty coarse estimate. > Binding the length of the MCV list to the size of the histogram is > arbitrary but so would any other value Wouldn't the best approach be to stop adding MCV's/histogram buckets when adding new ones doesn't decrease your prediction error 'substantially'? One very hacky threshold heuristic is to stop adding MCV's when a simple equality select ( SELECT col FROM table WHERE col == VALUE ) would switch the plan from an index to a sequential scan ( or vice versa, although with the current code this would never happen ). ie, if the non_mcv frequency estimate is 0.1% ( producing an index scan ), but adding the MCV gives us an estimate of 5% ( pbly producing a seq scan ) then add that value as an mcv. More sophisticated variations might also consider plan changes to very suboptimal joins; even more sophisticated would be to stop when the MAX( curr - optimal plan / optimal plan ) was below some threshold, say 20%, over a bunch of recently executed queries. A similar approach would work for histogram bins, except the queries of interest are inequality rather than equality selections. -Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers