Manfred Koizar [EMAIL PROTECTED] writes:
Random sampling is more like every possible sample is equally likely to
be collected, and two-stage sampling doesn't satisfy this condition.
Okay, I finally see the point here: in the limit as the number of pages
B goes to infinity, you'd expect the
On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane [EMAIL PROTECTED] wrote:
getting several tuples from the same page is more likely
than with the old method.
Hm, are you sure?
Almost sure. Let's look at a corner case: What is the probability of
getting a sample with no two tuples from the same
Manfred Koizar [EMAIL PROTECTED] writes:
If the number of pages is B and the sample size is n, a perfect sampling
method collects a sample where all tuples come from different pages with
probability (in OpenOffice.org syntax):
p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}}
On Tue, 2004-04-13 at 15:18, Tom Lane wrote:
Robert Treat [EMAIL PROTECTED] writes:
Well, the first problem is why is ANALYZE's estimate of the total row
count so bad :-( ? I suspect you are running into the situation where
the initial pages of the table are thinly populated and ANALYZE
On Fri, 16 Apr 2004 10:34:49 -0400, Tom Lane [EMAIL PROTECTED] wrote:
p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}}
So? You haven't proven that either sampling method fails to do the
same.
On the contrary, I believe that above formula is more or less valid for
both methods.
[Just a quick note here; a more thorough discussion of my test results
will be posted to -hackers]
On Tue, 13 Apr 2004 15:18:42 -0400, Tom Lane [EMAIL PROTECTED] wrote:
Well, the first problem is why is ANALYZE's estimate of the total row
count so bad :-( ? I suspect you are running into the
Manfred Koizar [EMAIL PROTECTED] writes:
My biggest concern at the moment is that the new sampling method
violates the contract of returning each possible sample with he same
probability: getting several tuples from the same page is more likely
than with the old method.
Hm, are you sure? I
In the process of optimizing some queries, I have found the following
query seems to degrade in performance the more accurate I make the
statistics on the table... whether by using increased alter table ...
set statistics or by using vacuum..
SELECT
count( cl.caller_id ),