On Wed, 2010-10-20 at 15:15 -0700, Josh Berkus wrote: > >> Maybe what should be done about this is to have separate sizes for the > >> MCV list and the histogram, where the MCV list is automatically sized > >> during ANALYZE. > > It's been suggested multiple times that we should base our sample size > on a % of the table, or at least offer that as an option. I've pointed > out (with math, which Simon wrote a prototype for) that doing > block-based sampling instead of random-row sampling would allow us to > collect, say, 2% of a very large table without more I/O than we're doing > now. > > Nathan Boley has also shown that we could get tremendously better > estimates without additional sampling if our statistics collector > recognized common patterns such as normal, linear and geometric > distributions. Right now our whole stats system assumes a completely > random distribution. > > So, I think we could easily be quite a bit smarter than just increasing > the MCV. Although that might be a nice start.
References would be nice. JD > > -- > -- Josh Berkus > PostgreSQL Experts Inc. > http://www.pgexperts.com > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers