Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > > > "Dann Corbit" <[EMAIL PROTECTED]> writes: > >> I also do not believe that there is any value that will be the right > >> answer. But a table of data might be useful both for people who want to > >> toy with altering the values and also for those who want to set the > >> defaults. I guess that at one time such a table was generated to > >> produce the initial estimates for default values. > > > > Sir, you credit us too much :-(. The actual story is that the current > > default of 10 was put in when we first implemented stats histograms, > > replacing code that kept track of only a *single* most common value > > (and not very well, at that). So it was already a factor of 10 more > > stats than we had experience with keeping, and accordingly conservatism > > suggested not boosting the default much past that. > > I think that's actually too little credit. The sample size is chosen quite > carefully based on solid mathematics to provide a specific confidence interval > estimate for queries covering ranges the size of a whole bucket. > > The actual number of buckets more of an arbitrary choice. It depends entirely > on how your data is distributed and how large a range your queries are > covering. A uniformly distributed data set should only need a single bucket to > generate good estimates. Less evenly distributed data sets need more. > > I wonder actually if there are algorithms for estimating the number of buckets > needed for a histogram to achieve some measurable goal. That would close the > loop. It would be much more reassuring to base the size of the sample on solid > statistics than on hunches.
I have a few thoughts on this. First, people are correct that there is no perfect default_statistics_target value. This is similar to the problem with the pre-8.4 max_fsm_pages/max_fsm_relations, for which there also was never a perfect value. But, if the FSM couldn't store all the free space, a server log message was issued that recommended increasing these values; the same is still done for checkpoint_segments. Is there a way we could emit a server log message to recommend increasing the statistics targets for specific columns? Also, is there a way to increase the efficiency of the statistics targets lookups? I assume the values are already sorted in the pg_statistic arrays; do we already do a binary lookup on those? Does that help? -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers