On 8 Feb 2011, at 2:39pm, Sven L wrote: > Is it reasonable to guess that sqlite_stat* are used only for NATURAL JOINs > and other not-so-obvious joins?
No. Consider this: SELECT * FROM myTable WHERE a=104 AND b=213 Suppose there are two indexes on myTable: one indexes only column a, and the other indexes only column b. The query optimizer has to choose between them. Which index would it be best to use ? The answer depends on the chunkiness of each column. If only three different values ever appear in column a, but column b can have a thousand different values, then it will be more efficient to use the index on column b. This will return fewer rows which have to be scanned one-by-one for values in a. But you're a good programmer. Knowing that you were going to have SELECT commands like that one above you chose to create an index on both columns (either order, it doesn't matter). Since you have done this, the results that ANALYZE gathers don't matter at all ! The query optimizer finds the good index and never has to choose between two bad indexes because it has the one perfect index which will always be best. So generally speaking ANALYZE matters only if the query optimizer has two or more equally bad options. If you do your job as a database designer well it will never need to consider chunkiness. The above explanation is simplified but gives you the general idea. > In my software, the database sometimes grows up to 1-2 GB immediately due to > the nature of the application. Hence, the guidelines for the ANALYZE command > do not suffice for me. "Do it once every x months" they say. User chooses to > import huge amounts of data on regular basis. He is also able to run custom > queries, which is why I found the sqlite_stat-tables useful. Doesn't matter. This new data won't change the relative chunkiness of the values in the columns. Though the precise numbers change, the /relative/ chunkiness probably won't, so the choice of indexes probably won't need to change either. Even less likely that it'll change by enough to change which index the query optimiser should pick. Doing another ANALYZE only really matters if the character of your data changes, which is usually when a column that used to have very big chunks suddenly has tiny chunks. The only time this happens if when a business changes what it does in a significant way: going from having five product lines to 500, or from selling in three countries to selling in thirty countries. After they've been putting in new data reflecting this new usage for a few months, /then/ it might be useful to run ANALYZE again. Even then, the only thing you're changing is what the query optimizer chooses as the best index. It might make the wrong decision and take 12ms for a SELECT instead of 4ms. Not really a huge problem: if a delay of 8ms is mission-critical you're probably using hardware at the edge of its capabilities. Simon. _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

