> I don't see why the MCVs would need a particularly large sample size > to calculate accurately. Have you done any tests on the accuracy of > the MCV list?
Yes, although I don't have them at my fingertips. In sum, though, you can't take 10,000 samples from a 1b row table and expect to get a remotely accurate MCV list. A while back I did a fair bit of reading on ndistinct and large tables from the academic literature. The consensus of many papers was that it took a sample of at least 3% (or 5% for block-based) of the table in order to have 95% confidence in ndistinct of 3X. I can't imagine that MCV is easier than this. > And mostly > what it tells me is that we need a robust statistical method and the > data structures it requires for estimating the frequency of a single > value. Agreed. > Binding the length of the MCV list to the size of the histogram is > arbitrary but so would any other value and I haven't seen anyone > propose any rationale for any particular value. histogram size != sample size. It is in our code, but that's a bug and not a feature. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers