Re: [HACKERS] Simple postgresql.conf wizard

Gregory Stark Wed, 26 Nov 2008 05:48:06 -0800

Tom Lane <[EMAIL PROTECTED]> writes:

> "Dann Corbit" <[EMAIL PROTECTED]> writes:
>> I also do not believe that there is any value that will be the right
>> answer.  But a table of data might be useful both for people who want to
>> toy with altering the values and also for those who want to set the
>> defaults.  I guess that at one time such a table was generated to
>> produce the initial estimates for default values.
>
> Sir, you credit us too much :-(.  The actual story is that the current
> default of 10 was put in when we first implemented stats histograms,
> replacing code that kept track of only a *single* most common value
> (and not very well, at that).  So it was already a factor of 10 more
> stats than we had experience with keeping, and accordingly conservatism
> suggested not boosting the default much past that.


I think that's actually too little credit. The sample size is chosen quite
carefully based on solid mathematics to provide a specific confidence interval
estimate for queries covering ranges the size of a whole bucket.

The actual number of buckets more of an arbitrary choice. It depends entirely
on how your data is distributed and how large a range your queries are
covering. A uniformly distributed data set should only need a single bucket to
generate good estimates. Less evenly distributed data sets need more.

I wonder actually if there are algorithms for estimating the number of buckets
needed for a histogram to achieve some measurable goal. That would close the
loop. It would be much more reassuring to base the size of the sample on solid
statistics than on hunches.

> So we really don't have any methodically-gathered evidence about the
> effects of different stats settings.  It wouldn't take a lot to convince
> us to switch to a different default, I think, but it would be nice to
> have more than none.

I think the difficulty (aside from testing being laborious at the best of
times) is that it's heavily dependent on data sets which are hard to generate
good examples for. Offhand I would think the census data might make a good
starting point -- it should have columns which range from perfectly uniform to
highly skewed.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple postgresql.conf wizard

Reply via email to