Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

Josh Berkus Mon, 25 Apr 2005 12:14:32 -0700

Simon, Tom:

While it's not possible to get accurate estimates from a fixed size sample, I 
think it would be possible from a small but scalable sample: say, 0.1% of all 
data pages on large tables, up to the limit of maintenance_work_mem.


Setting up these samples as a % of data pages, rather than a pure random sort, 
makes this more feasable; for example, a 70GB table would only need to sample 
about 9000 data pages (or 70MB).  Of course, larger samples would lead to 
better accuracy, and this could be set through a revised GUC (i.e., 
maximum_sample_size, minimum_sample_size).   

I just need a little help doing the math ... please?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

Reply via email to