On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> getting several tuples from the same page is more likely >> than with the old method. > >Hm, are you sure?

Almost sure. Let's look at a corner case: What is the probability of getting a sample with no two tuples from the same page? To simplify the problem assume that each page contains the same number of tuples c. If the number of pages is B and the sample size is n, a perfect sampling method collects a sample where all tuples come from different pages with probability (in OpenOffice.org syntax): p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}} or in C: p = 1.0; for (i = 0; i < n; ++i) p *= c*(B - i) / (c*B - i) This probability grows with increasing B. >Also, I'm not at all sure that the old method satisfies that constraint >completely in the presence of nonuniform numbers of tuples per page, >so we'd not necessarily be going backwards anyhow ... Yes, it boils down to a decision whether we want to replace one not quite perfect sampling method with another not quite perfect method. I'm still working on putting together the pros and cons ... Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org