On Wed, 2010-10-20 at 15:15 -0700, Josh Berkus wrote:
> >> Maybe what should be done about this is to have separate sizes for the
> >> MCV list and the histogram, where the MCV list is automatically sized
> >> during ANALYZE.
> 
> It's been suggested multiple times that we should base our sample size
> on a % of the table, or at least offer that as an option.  I've pointed
> out (with math, which Simon wrote a prototype for) that doing
> block-based sampling instead of random-row sampling would allow us to
> collect, say, 2% of a very large table without more I/O than we're doing
> now.
> 
> Nathan Boley has also shown that we could get tremendously better
> estimates without additional sampling if our statistics collector
> recognized common patterns such as normal, linear and geometric
> distributions.  Right now our whole stats system assumes a completely
> random distribution.
> 
> So, I think we could easily be quite a bit smarter than just increasing
> the MCV.  Although that might be a nice start.

References would be nice.

JD


> 
> -- 
>                                   -- Josh Berkus
>                                      PostgreSQL Experts Inc.
>                                      http://www.pgexperts.com
> 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to