On 8 Feb 2011, at 2:39pm, Sven L wrote:

> Is it reasonable to guess that sqlite_stat* are used only for NATURAL JOINs 
> and other not-so-obvious joins?

No.  Consider this:

SELECT * FROM myTable WHERE a=104 AND b=213

Suppose there are two indexes on myTable: one indexes only column a, and the 
other indexes only column b.  The query optimizer has to choose between them.  
Which index would it be best to use ?

The answer depends on the chunkiness of each column.  If only three different 
values ever appear in column a, but column b can have a thousand different 
values, then it will be more efficient to use the index on column b.  This will 
return fewer rows which have to be scanned one-by-one for values in a.

But you're a good programmer.  Knowing that you were going to have SELECT 
commands like that one above you chose to create an index on both columns 
(either order, it doesn't matter).  Since you have done this, the results that 
ANALYZE gathers don't matter at all !  The query optimizer finds the good index 
and never has to choose between two bad indexes because it has the one perfect 
index which will always be best.

So generally speaking ANALYZE matters only if the query optimizer has two or 
more equally bad options.  If you do your job as a database designer well it 
will never need to consider chunkiness.  The above explanation is simplified 
but gives you the general idea.

> In my software, the database sometimes grows up to 1-2 GB immediately due to 
> the nature of the application. Hence, the guidelines for the ANALYZE command 
> do not suffice for me. "Do it once every x months" they say. User chooses to 
> import huge amounts of data on regular basis. He is also able to run custom 
> queries, which is why I found the sqlite_stat-tables useful.

Doesn't matter.  This new data won't change the relative chunkiness of the 
values in the columns.  Though the precise numbers change, the /relative/ 
chunkiness probably won't, so the choice of indexes probably won't need to 
change either.  Even less likely that it'll change by enough to change which 
index the query optimiser should pick.

Doing another ANALYZE only really matters if the character of your data 
changes, which is usually when a column that used to have very big chunks 
suddenly has tiny chunks.  The only time this happens if when a business 
changes what it does in a significant way: going from having five product lines 
to 500, or from selling in three countries to selling in thirty countries.  
After they've been putting in new data reflecting this new usage for a few 
months, /then/ it might be useful to run ANALYZE again.

Even then, the only thing you're changing is what the query optimizer chooses 
as the best index.  It might make the wrong decision and take 12ms for a SELECT 
instead of 4ms.  Not really a huge problem: if a delay of 8ms is 
mission-critical you're probably using hardware at the edge of its capabilities.

Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to