Re: [PERFORM] query slows down with more accurate stats

2004-04-19 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes: Random sampling is more like every possible sample is equally likely to be collected, and two-stage sampling doesn't satisfy this condition. Okay, I finally see the point here: in the limit as the number of pages B goes to infinity, you'd expect the

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Manfred Koizar
On Thu, 15 Apr 2004 20:18:49 -0400, Tom Lane [EMAIL PROTECTED] wrote: getting several tuples from the same page is more likely than with the old method. Hm, are you sure? Almost sure. Let's look at a corner case: What is the probability of getting a sample with no two tuples from the same

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes: If the number of pages is B and the sample size is n, a perfect sampling method collects a sample where all tuples come from different pages with probability (in OpenOffice.org syntax): p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}}

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Robert Treat
On Tue, 2004-04-13 at 15:18, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: Well, the first problem is why is ANALYZE's estimate of the total row count so bad :-( ? I suspect you are running into the situation where the initial pages of the table are thinly populated and ANALYZE

Re: [PERFORM] query slows down with more accurate stats

2004-04-16 Thread Manfred Koizar
On Fri, 16 Apr 2004 10:34:49 -0400, Tom Lane [EMAIL PROTECTED] wrote: p = prod from{i = 0} to{n - 1} {{c(B - i)} over {cB - i}} So? You haven't proven that either sampling method fails to do the same. On the contrary, I believe that above formula is more or less valid for both methods.

Re: [PERFORM] query slows down with more accurate stats

2004-04-15 Thread Manfred Koizar
[Just a quick note here; a more thorough discussion of my test results will be posted to -hackers] On Tue, 13 Apr 2004 15:18:42 -0400, Tom Lane [EMAIL PROTECTED] wrote: Well, the first problem is why is ANALYZE's estimate of the total row count so bad :-( ? I suspect you are running into the

Re: [PERFORM] query slows down with more accurate stats

2004-04-15 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes: My biggest concern at the moment is that the new sampling method violates the contract of returning each possible sample with he same probability: getting several tuples from the same page is more likely than with the old method. Hm, are you sure? I

[PERFORM] query slows down with more accurate stats

2004-04-13 Thread Robert Treat
In the process of optimizing some queries, I have found the following query seems to degrade in performance the more accurate I make the statistics on the table... whether by using increased alter table ... set statistics or by using vacuum.. SELECT count( cl.caller_id ),