On May10, 2012, at 18:36 , Kevin Grittner wrote: > Robert Haas <robertmh...@gmail.com> wrote: > >> I wonder if you could do this with something akin to the Bitmap >> Heap Scan machinery. Populate a TID bitmap with a bunch of >> randomly chosen TIDs, fetch them all in physical order >> and if you don't get as many rows as you need, rinse and repeat >> until you do. > > Ay, there's the rub. If you get too many, it is important that you > read all the way to the end and then randomly omit some of them.
Why is that? From a statistical point of view it shouldn't matter whether you pick N random samples, or pick M >= N random samples an then randomly pick N from M. (random implying uniformly distributed here). > While a bit of a bother, that's pretty straightforward and should be > pretty fast, assuming you're not, like, an order of magnitude high. > But falling short is tougher; making up the difference could be an > iterative process, which could always wind up with having you read > all tuples in the table without filling your sample. But the likelihood of that happening is extremely low, no? Unless the sampling percentage is very high, that is, but that case isn't of much practical importance anyway. But something else comes to mind. Does the standard permit samples taken with the BERNOULLI method to contain the same tuple multiple times? If not, any kind of TID-based approach will have to all previously fetched TIDs, which seems doable but unfortunate... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers