On Sun, 7 Mar 2004, Tom Lane wrote:

> Neil Conway <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> This is something we need to consider, but we'll need more evidence
> >> before making a choice.  One thing that we have very little data about
> >> is how much difference it makes in the quality of planner choices.
> 
> > Right, but is there a practical way to actually get this data?
> 
> I haven't thought of one yet, but perhaps someone will have an idea.

Hi Tom.  I ran some very simple tests on analyze times and query plan 
times on a very simple table, with data randomly distributed.  The index 
was on a date field, since that's what I was testing last.

This was all done on my 512Meg memory 1.1GHz celeron workstation with an 
IDE drive.  I'd love more input on better testing methodologies here...

with 100k or 1M rows that look kinda like this:  (I'll test 10M rows 
later, which means the dataset won't fit in memory, so there'll be lots of 
access going on.  Right now the 1M row table is 80 meg)

 select * from test2 limit 5;
                    info                     |         dt          |   id
---------------------------------------------+---------------------+---------
 Francize perfectible swirling fluctuates    | 2004-05-20 20:12:04 | 2721995
 Fields chauffeur attentionality grandmother | 2004-04-07 14:36:02 | 2721996
 Belgium bilked explosively defendant        | 2004-09-16 16:27:22 | 2721997
 perspectives Buenos Pollux discriminates    | 2004-11-11 12:28:31 | 2721998
 Victorianize Savonarola blackmails sufficed | 2004-02-27 21:17:20 | 2721999
(5 rows)

here's what I get with different statistics targets for analyze times:

        100k    1M      1M      
        analyze analyze plan    
target  ms      ms      ms      
10      250     875     2       
20      350     1250            
30      430     1500            
40      520     1725            
50      580     1900            
60      690     2100            
70      775     2175            
80      850     2300            
90      950     2400            
100     1000    2600    2.5     
200     1806    3700            
300     2600    4800            
400     2600    5900            
500     2600    7200            
700     2600    9500            
1000    2600    13000   5       

Since this data is randomly distributed, I didn't bother doing a lot of 
testing to see how accurate each target setting was.  If that would be 
useful to know I'd gladly test it, but I was only setting out to test the 
time to analyze and the time to plan.

Note that I only tested 3 targets for planning time, as it didn't seem to 
make a very big difference.  The query was:

select * from test2 where dt between 'march 11, 2004' and 'march 13, 2004';

I also ran some quick tests on smaller tables (1000 and 10k rows) and 
there, the plateau that we see in the 100k analyze shows up much quicker, 
at something like 50 or so.  I.e. the analyze time flattened out quickly 
and higher numbers cost very little if anything.

Since this query was quite an easy plan, I'd expect to need a much more 
complex one to test the increase in planning time, say something that has 
to look at a lot of statistics.  Any particular join type or something 
that's likely to do that?



---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to