[sqlite] Query planner improvements in case of AUTOMATIC INDEX

Rob Golsteijn Wed, 19 Sep 2018 02:21:02 -0700

Hi List,

When investigating performance of one of our queries I found an interesting 
situation that might be an opportunity for performance improvement.
Tested with Sqlite version 3.15.2 (November 2016).


Consider the following table and query

CREATE TABLE Node
(
    Id              INTEGER PRIMARY KEY AUTOINCREMENT,
    x               NUMBER(10),
    y               NUMBER(10), 
    HeightLevel     NUMBER(2),
    GeomWGS84       BLOB,
    Perm_id         TEXT
    /* some irrelevant fields removed */
);

/* find duplicates */
SELECT NOD1.PERM_ID,
       NOD1.X,
       NOD1.Y,
       NOD1.HeightLevel,
       NOD1.GeomWGS84
  FROM            Node NOD1
       INNER JOIN Node NOD2 ON NOD1.X           = NOD2.X
                           AND NOD1.Y           = NOD2.Y 
                           AND NOD1.HeightLevel = NOD2.HeightLevel
                           AND NOD1.GeomWGS84   = NOD2.GeomWGS84
                           AND NOD1.ID         <> NOD2.ID
  ORDER BY NOD1.GeomWGS84;

The query plan of this query is
selectid|order|from|detail
0|0|0|SCAN TABLE Node AS NOD1
0|1|1|SEARCH TABLE Node AS NOD2 USING AUTOMATIC COVERING INDEX (GeomWGS84=? AND 
HeightLevel=? AND y=? AND x=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY

It takes 636 seconds wall clock time to execute the query.

I would expect that this is the execution time of creating the index + the 
execution time of the query when the index is already present.
So if I create the "AUTOMATIC" index explicitly the expected total execution 
time would also be 636 seconds, but

CREATE INDEX idx_node on node (GeomWGS84, HeightLevel, y, x);
Takes 40 seconds and subsequent query execution takes 8 seconds. So 48 seconds 
in total compared to 636 second with the AUTOMATIC query.

The explanation can be found when looking at the query plan of the query (in 
the new schema with index present):

selectid|order|from|detail
0|0|0|SCAN TABLE DH_NOD AS NOD1 USING INDEX idx_node
0|1|1|SEARCH TABLE DH_NOD AS NOD2 USING COVERING INDEX idx_node (GeomWGS84=? 
AND HeightLevel=? AND y=? AND x=?)

So the explicit index is now also used for ORDER BY optimization.

I guess in general it could be used for other optimizations as well .


The optimization possibility is to re-evaluate the query plan, taking also the 
AUTOMATIC indexes into account, once Sqlite decided that AUTOMATIC indexes are 
useful. 
To avoid extra planning time, maybe this should only be done when AUTOMATICALLY 
INDEXED table(s) are used multiple times in the query (otherwise they will not 
change the query plan anyway)?
Since query planning is typically fast compared to query execution, the extra 
iteration of the query planner may be acceptable for the cases the query plan 
cannot be improved. For our company it would be acceptable but in general I 
cannot judge.

Regards,
Rob Golsteijn
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Query planner improvements in case of AUTOMATIC INDEX

Reply via email to