On Tue, Apr 24, 2012 at 10:31 AM, Sandro Santilli <s...@keybit.net> wrote:
> On Tue, Apr 24, 2012 at 08:49:26AM +0200, Sandro Santilli wrote:
>> On Mon, Apr 23, 2012 at 08:34:44PM +0300, Ants Aasma wrote:
>> > SELECT (SELECT reservoir_sample(some_table, 50) AS samples
>> >    FROM some_table WHERE ctid =~ ANY (rnd_pgtids))
>> > FROM random_pages('some_table', 50) AS rnd_pgtids;
>> But I don't understand the reservoir_sample call, what is it supposed to do ?
> Ok got it, that was probably to avoid:
>  ERROR:  more than one row returned by a subquery used as an expression

No, it's to avoid bias towards tuples on more sparsely populated
pages. See http://en.wikipedia.org/wiki/Reservoir_sampling or

> The advanced TID operator would be for random_tids to only return pages rather
> than full tids...

Exactly. But when mainly IO bound (ie. sampling from a large table on
spinning rust) the overhead of probing with TID scan as opposed to
sequentially scanning the pages should be small enough. When CPU bound
I suspect that the function call machinery overhead for
reservoir_sample is going to become a large issue, so a built in
tablesample also has an edge there.

Ants Aasma
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to