On Wed, Oct 20, 2010 at 10:53 AM, Alvaro Herrera <alvhe...@commandprompt.com> wrote: > Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010: > >> Actually, I think the best thing for default_statistics_target might >> be to scale the target based on the number of rows in the table, e.g. >> given N rows: >> >> 10 + (N / 1000), if N < 40,000 >> 46 + (N / 10000), if 50,000 < N < 3,540,000 >> 400, if N > 3,540,000 >> >> Consider a table with 2,000 rows. With default_statistics_target = >> 100, we can store up to 100 MCVs; and we break the remaining ~1900 >> values up into 100 buckets with 19 values/bucket. > > Maybe what should be done about this is to have separate sizes for the > MCV list and the histogram, where the MCV list is automatically sized > during ANALYZE.
I thought about that, but I'm not sure there's any particular advantage. Automatically scaling the histogram seems just as useful as automatically scaling the MCV list - both things will tend to reduce the estimation error. For a table with 2,000,000 rows, automatically setting the statistics target from 100 to the value that would be computed by the above formula, which happens to be 246, will help the 101th-246th most common values, because they will now be MCVs. It will also help all the remaining values, both because you've pulled 146 fairly common values out of the histogram buckets and also because each bucket now contains ~8130 values rather than ~20,000. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers