Perhaps I can save you some time (yes, I have a degree in Math). If I understand correctly, you're trying extrapolate from the correlation between a tiny sample and a larger sample. Introducing the tiny sample into any decision can only produce a less accurate result than just taking the larger sample on its own; GIGO. Whether they are consistent with one another has no relationship to whether the larger sample correlates with the whole population. You can think of the tiny sample like "anecdotal" evidence for wonderdrugs.

Ok, good point.

`I'm with Tom though in being very wary of solutions that require even one-off whole table scans. Maybe we need an additional per-table statistics setting which could specify the sample size, either as an absolute number or as a percentage of the table. It certainly seems that where D/N ~ 0.3, the estimates on very large tables at least are way way out.`

Or maybe we need to support more than one estimation method.

Or both ;-)

