On 2015-08-06 22:25, Josh Berkus wrote:
On 08/06/2015 01:19 PM, Simon Riggs wrote:
For me, the docs seem exactly correct. The mathematical implications of
that just aren't recorded explicitly.

Well, for the SELECT page, all we need is the following (one changed
sentence):

The SYSTEM method is significantly faster than the BERNOULLI method when
small sampling percentages are specified, but it may return a
less-random sample of the table as a result of clustering effects, and
may return a highly variable number of results for very small sample sizes.


BTW this was one of the motivations for making tsm_system_rows contrib module, that one will give you exact number of tuples while still doing page level sampling. But since it does linear probing it's only useful if you want those really small amounts of tuples because it will always do random I/O even if you are scanning large part of the table.


I will try to reword or add something to make it clear that this can
return a variable number of blocks and thus produces a result with
greater variability in the number of rows returned.

It's documented on the SELECT page only; plus there is a whole new
section on writing tablesample functions.

Seems like it would be nice to have more detailed user docs somewhere
which explain the sampling algos we have, especially if we get more in
the future.  Not sure where would be appropriate for that, though.

If there is no appropriate place, I'll just write a blog.


There is a blog post on 2ndQ blog page which tries to describe the sampling methods visually, not sure if it's more obvious from that or not. It's somewhat broken on planet though (only title there).

--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to