Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

Andrew Dunstan Wed, 27 Apr 2005 06:46:17 -0700

Mischa Sandberg wrote:

Perhaps I can save you some time (yes, I have a degree in Math). If I understand correctly, you're trying extrapolate from the correlation between a tiny sample and a larger sample. Introducing the tiny sample into any decision can only produce a less accurate result than just taking the larger sample on its own; GIGO. Whether they are consistent with one another has no relationship to whether the larger sample correlates with the whole population. You can think of the tiny sample like "anecdotal" evidence for wonderdrugs.


Ok, good point.

I'm with Tom though in being very wary of solutions that require even one-off whole table scans. Maybe we need an additional per-table statistics setting which could specify the sample size, either as an absolute number or as a percentage of the table. It certainly seems that where D/N ~ 0.3, the estimates on very large tables at least are way way out.

Or maybe we need to support more than one estimation method.

Or both ;-)

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

Reply via email to