Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

Petr Jelinek Thu, 06 Aug 2015 13:46:34 -0700

On 2015-08-06 22:25, Josh Berkus wrote:

On 08/06/2015 01:19 PM, Simon Riggs wrote:

For me, the docs seem exactly correct. The mathematical implications of
that just aren't recorded explicitly.


Well, for the SELECT page, all we need is the following (one changed
sentence):

The SYSTEM method is significantly faster than the BERNOULLI method when
small sampling percentages are specified, but it may return a
less-random sample of the table as a result of clustering effects, and
may return a highly variable number of results for very small sample sizes.

BTW this was one of the motivations for making tsm_system_rows contribmodule, that one will give you exact number of tuples while still doingpage level sampling. But since it does linear probing it's only usefulif you want those really small amounts of tuples because it will alwaysdo random I/O even if you are scanning large part of the table.


I will try to reword or add something to make it clear that this can
return a variable number of blocks and thus produces a result with
greater variability in the number of rows returned.

It's documented on the SELECT page only; plus there is a whole new
section on writing tablesample functions.


Seems like it would be nice to have more detailed user docs somewhere
which explain the sampling algos we have, especially if we get more in
the future.  Not sure where would be appropriate for that, though.

If there is no appropriate place, I'll just write a blog.

There is a blog post on 2ndQ blog page which tries to describe thesampling methods visually, not sure if it's more obvious from that ornot. It's somewhat broken on planet though (only title there).


--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

Reply via email to