On Sun, 7 Mar 2004, Tom Lane wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> This is something we need to consider, but we'll need more evidence > >> before making a choice. One thing that we have very little data about > >> is how much difference it makes in the quality of planner choices. > > > Right, but is there a practical way to actually get this data? > > I haven't thought of one yet, but perhaps someone will have an idea.
Hi Tom. I ran some very simple tests on analyze times and query plan times on a very simple table, with data randomly distributed. The index was on a date field, since that's what I was testing last. This was all done on my 512Meg memory 1.1GHz celeron workstation with an IDE drive. I'd love more input on better testing methodologies here... with 100k or 1M rows that look kinda like this: (I'll test 10M rows later, which means the dataset won't fit in memory, so there'll be lots of access going on. Right now the 1M row table is 80 meg) select * from test2 limit 5; info | dt | id ---------------------------------------------+---------------------+--------- Francize perfectible swirling fluctuates | 2004-05-20 20:12:04 | 2721995 Fields chauffeur attentionality grandmother | 2004-04-07 14:36:02 | 2721996 Belgium bilked explosively defendant | 2004-09-16 16:27:22 | 2721997 perspectives Buenos Pollux discriminates | 2004-11-11 12:28:31 | 2721998 Victorianize Savonarola blackmails sufficed | 2004-02-27 21:17:20 | 2721999 (5 rows) here's what I get with different statistics targets for analyze times: 100k 1M 1M analyze analyze plan target ms ms ms 10 250 875 2 20 350 1250 30 430 1500 40 520 1725 50 580 1900 60 690 2100 70 775 2175 80 850 2300 90 950 2400 100 1000 2600 2.5 200 1806 3700 300 2600 4800 400 2600 5900 500 2600 7200 700 2600 9500 1000 2600 13000 5 Since this data is randomly distributed, I didn't bother doing a lot of testing to see how accurate each target setting was. If that would be useful to know I'd gladly test it, but I was only setting out to test the time to analyze and the time to plan. Note that I only tested 3 targets for planning time, as it didn't seem to make a very big difference. The query was: select * from test2 where dt between 'march 11, 2004' and 'march 13, 2004'; I also ran some quick tests on smaller tables (1000 and 10k rows) and there, the plateau that we see in the 100k analyze shows up much quicker, at something like 50 or so. I.e. the analyze time flattened out quickly and higher numbers cost very little if anything. Since this query was quite an easy plan, I'd expect to need a much more complex one to test the increase in planning time, say something that has to look at a lot of statistics. Any particular join type or something that's likely to do that? ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings