Re: [HACKERS] Gsoc2012 idea, tablesample

Kevin Grittner Thu, 10 May 2012 09:37:35 -0700

Robert Haas <[email protected]> wrote:

> I wonder if you could do this with something akin to the Bitmap
> Heap Scan machinery.  Populate a TID bitmap with a bunch of
> randomly chosen TIDs, fetch them all in physical order

It would be pretty hard for any other plan to beat that by very
much, so it seems like a good approach which helps keep things
simple.

> and if you don't get as many rows as you need, rinse and repeat
> until you do.

Ay, there's the rub.  If you get too many, it is important that you
read all the way to the end and then randomly omit some of them. 
While a bit of a bother, that's pretty straightforward and should be
pretty fast, assuming you're not, like, an order of magnitude high. 
But falling short is tougher; making up the difference could be an
iterative process, which could always wind up with having you read
all tuples in the table without filling your sample.  Still, this
approach seems like it would perform better than generating random
ctid values and randomly fetching until you've tried them all.

> I'm worried this project is getting so complicated that it will be
> beyond the ability of a new hacker to get anything useful done. 
> Can we simplify the requirements here to something that is
> reasonable for a beginner?

I would be inclined to omit monetary unit sampling from the first
commit.  Do the parts specified in the standard first and get it
committed.  Useful as unit sampling is, it seems like the hardest to
do, and should probably be done "if time permits" or left as a
future enhancement.  It's probably enough to just remember that it's
there and make a "best effort" attempt not to paint ourselves in a
corner which precludes its development.

-Kevin


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gsoc2012 idea, tablesample

Reply via email to