On 04/10/15 21:57, Petr Jelinek wrote:
On 10/04/15 21:26, Peter Eisentraut wrote:

But this was not really my point, the BERNOULLI just does not work
well with row-limit by definition, it applies probability on each
individual row and while you can get probability from percentage very
easily (just divide by 100), to get it for specific target number of
rows you have to know total number of source rows and that's not
something we can do very accurately so then you won't get 500 rows
but approximately 500 rows.

It's actually even trickier. Even if you happen to know the exact number of rows in the table, you can't just convert that into a percentage like that and use it for BERNOULLI sampling. It may give you different number of result rows, because each row is sampled independently.

That is why we have Vitter's algorithm for reservoir sampling, which works very differently from BERNOULLI.

Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to