"Dave Held" <[EMAIL PROTECTED]> writes: > > Actually, it's more to characterize how large of a sample > > we need. For example, if we sample 0.005 of disk pages, and > > get an estimate, and then sample another 0.005 of disk pages > > and get an estimate which is not even close to the first > > estimate, then we have an idea that this is a table which > > defies analysis based on small samples. > > I buy that.
Better yet is to use the entire sample you've gathered of .01 and then perform analysis on that sample to see what the confidence interval is. Which is effectively the same as what you're proposing except looking at every possible partition. Unfortunately the reality according to the papers that were sent earlier is that you will always find the results disappointing. Until your sample is nearly the entire table your estimates for n_distinct will be extremely unreliable. -- greg ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org