Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

Nathan Boley Wed, 20 Oct 2010 18:53:59 -0700

> Robert explained why having more MCVs might be useful because we use
> the frequency of the least common MCV as an upper bound on the
> frequency of any value in the MCV.


Where is that being used? The only non-MCV frequency estimate that I
recall seeing is ( nrows - n_ndistinct_rows  )/ndistinct. Obviously
changing the number of mcv's affects this by lowering
n_ndistinct_rows, but it's always pretty coarse estimate.

>  Binding the length of the MCV list to the size of the histogram is
> arbitrary but so would any other value

Wouldn't the best approach be to stop adding MCV's/histogram buckets
when adding new ones doesn't decrease your prediction error
'substantially'?

One very hacky threshold heuristic is to stop adding MCV's when a
simple equality select (  SELECT col FROM table WHERE col == VALUE )
would switch the plan from an index to a sequential scan ( or vice
versa, although with the current code this would never happen ). ie,
if the non_mcv frequency estimate is 0.1% ( producing an index scan ),
but adding the MCV gives us an estimate of 5% ( pbly producing a seq
scan ) then add that value as an mcv. More sophisticated variations
might also consider plan changes to very suboptimal joins; even more
sophisticated would be to stop when the MAX( curr - optimal plan /
optimal plan ) was below some threshold, say 20%, over a bunch of
recently executed queries.

A similar approach would work for histogram bins, except the queries
of interest are inequality rather than equality selections.

-Nathan

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default_statistics_target WAS: max_wal_senders must die

Reply via email to