Josh Berkus wrote:

Okay, although given the track record of page-based sampling for
n-distinct, it's a bit like looking for your keys under the streetlight,
rather than in the alley where you dropped them :-)

Bad analogy, but funny.

The issue with page-based vs. pure random sampling is that to do, for example,
10% of rows purely randomly would actually mean loading 50% of pages.  With
20% of rows, you might as well scan the whole table.

Unless, of course, we use indexes for sampling, which seems like a *really
good* idea to me ....

But doesn't an index only sample one column at a time, whereas with page-based sampling, you can sample all of the columns at once. And not all columns would have indexes, though it could be assumed that if a column doesn't have an index, then it doesn't matter as much for calculations such as n_distinct.

But if you had 5 indexed rows in your table, then doing it index wise
means you would have to make 5 passes instead of just one.

Though I agree that page-based sampling is important for performance


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to