Hi List, When investigating performance of one of our queries I found an interesting situation that might be an opportunity for performance improvement. Tested with Sqlite version 3.15.2 (November 2016).
Consider the following table and query CREATE TABLE Node ( Id INTEGER PRIMARY KEY AUTOINCREMENT, x NUMBER(10), y NUMBER(10), HeightLevel NUMBER(2), GeomWGS84 BLOB, Perm_id TEXT /* some irrelevant fields removed */ ); /* find duplicates */ SELECT NOD1.PERM_ID, NOD1.X, NOD1.Y, NOD1.HeightLevel, NOD1.GeomWGS84 FROM Node NOD1 INNER JOIN Node NOD2 ON NOD1.X = NOD2.X AND NOD1.Y = NOD2.Y AND NOD1.HeightLevel = NOD2.HeightLevel AND NOD1.GeomWGS84 = NOD2.GeomWGS84 AND NOD1.ID <> NOD2.ID ORDER BY NOD1.GeomWGS84; The query plan of this query is selectid|order|from|detail 0|0|0|SCAN TABLE Node AS NOD1 0|1|1|SEARCH TABLE Node AS NOD2 USING AUTOMATIC COVERING INDEX (GeomWGS84=? AND HeightLevel=? AND y=? AND x=?) 0|0|0|USE TEMP B-TREE FOR ORDER BY It takes 636 seconds wall clock time to execute the query. I would expect that this is the execution time of creating the index + the execution time of the query when the index is already present. So if I create the "AUTOMATIC" index explicitly the expected total execution time would also be 636 seconds, but CREATE INDEX idx_node on node (GeomWGS84, HeightLevel, y, x); Takes 40 seconds and subsequent query execution takes 8 seconds. So 48 seconds in total compared to 636 second with the AUTOMATIC query. The explanation can be found when looking at the query plan of the query (in the new schema with index present): selectid|order|from|detail 0|0|0|SCAN TABLE DH_NOD AS NOD1 USING INDEX idx_node 0|1|1|SEARCH TABLE DH_NOD AS NOD2 USING COVERING INDEX idx_node (GeomWGS84=? AND HeightLevel=? AND y=? AND x=?) So the explicit index is now also used for ORDER BY optimization. I guess in general it could be used for other optimizations as well . The optimization possibility is to re-evaluate the query plan, taking also the AUTOMATIC indexes into account, once Sqlite decided that AUTOMATIC indexes are useful. To avoid extra planning time, maybe this should only be done when AUTOMATICALLY INDEXED table(s) are used multiple times in the query (otherwise they will not change the query plan anyway)? Since query planning is typically fast compared to query execution, the extra iteration of the query planner may be acceptable for the cases the query plan cannot be improved. For our company it would be acceptable but in general I cannot judge. Regards, Rob Golsteijn _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users