Robert Haas <robertmh...@gmail.com> wrote: > I wonder if you could do this with something akin to the Bitmap > Heap Scan machinery. Populate a TID bitmap with a bunch of > randomly chosen TIDs, fetch them all in physical order It would be pretty hard for any other plan to beat that by very much, so it seems like a good approach which helps keep things simple. > and if you don't get as many rows as you need, rinse and repeat > until you do. Ay, there's the rub. If you get too many, it is important that you read all the way to the end and then randomly omit some of them. While a bit of a bother, that's pretty straightforward and should be pretty fast, assuming you're not, like, an order of magnitude high. But falling short is tougher; making up the difference could be an iterative process, which could always wind up with having you read all tuples in the table without filling your sample. Still, this approach seems like it would perform better than generating random ctid values and randomly fetching until you've tried them all. > I'm worried this project is getting so complicated that it will be > beyond the ability of a new hacker to get anything useful done. > Can we simplify the requirements here to something that is > reasonable for a beginner? I would be inclined to omit monetary unit sampling from the first commit. Do the parts specified in the standard first and get it committed. Useful as unit sampling is, it seems like the hardest to do, and should probably be done "if time permits" or left as a future enhancement. It's probably enough to just remember that it's there and make a "best effort" attempt not to paint ourselves in a corner which precludes its development. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers