On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote:
>> getting several tuples from the same page is more likely
>> than with the old method.
>Hm, are you sure?

Almost sure.  Let's look at a corner case:  What is the probability of
getting a sample with no two tuples from the same page?  To simplify the
problem assume that each page contains the same number of tuples c.

If the number of pages is B and the sample size is n, a perfect sampling
method collects a sample where all tuples come from different pages with
probability (in OpenOffice.org syntax):

        p = prod from{i = 0} to{n - 1} {{c(B - i)}  over {cB - i}}

or in C:

        p = 1.0;
        for (i = 0; i < n; ++i)
                p *= c*(B - i) / (c*B - i)

This probability grows with increasing B.

>Also, I'm not at all sure that the old method satisfies that constraint
>completely in the presence of nonuniform numbers of tuples per page,
>so we'd not necessarily be going backwards anyhow ...

Yes, it boils down to a decision whether we want to replace one not
quite perfect sampling method with another not quite perfect method.
I'm still working on putting together the pros and cons ...


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?


Reply via email to