[sqlite] query execution plan inconsistency?

Petite Abeille Sat, 26 Apr 2008 15:17:39 -0700

Hello,

[sqlite 3.5.8, Mac OS X 10.5.2]


Given the following schema [1]:

document <->> document_token <<-> token

I'm trying to retrieve some documents with a set of token(s) [2].

Things works rather fine when looking for multiple tokens:

select      sum( document_token.weight ) as score,
             document.name as name
from        (
                 select  id
                 from    token
                 where   name >= 'america' and name < 'americb'

                 union
                 select  id
                 from    token
                 where   name >= 'america' and name < 'americb'
             ) as token
join        document_token on document_token.token_id = token.id
join        document on document.id = document_token.document_id
group by    document.name
having      count( * ) >= 2
order by    1 desc,
             2
limit       5;

0|0|TABLE token WITH INDEX token_name
0|0|TABLE token WITH INDEX token_name
0|0|TABLE  AS token
1|1|TABLE document_token WITH INDEX document_token_token_id
2|2|TABLE document USING PRIMARY KEY

CPU Time: user 0.130602 sys 0.073889

Unfortunately, with only one token value, the execution time goes  
through the roof:

select      sum( document_token.weight ) as score,
             document.name as name
from        (
                 select  id
                 from    token
                 where   name >= 'america' and name < 'americb'
             ) as token
join        document_token on document_token.token_id = token.id
join        document on document.id = document_token.document_id
group by    document.name
having      count( * ) >= 1
order by    1 desc,
             2
limit       5;

0|2|TABLE document WITH INDEX document_name ORDER BY
1|1|TABLE document_token WITH INDEX document_token_document_id
2|0|TABLE token USING PRIMARY KEY

CPU Time: user 14.673524 sys 1.378078

This seems to be due to the fact that in the second case SQLite will  
favor the document table to lead the query, resulting in pretty much a  
full scan of the entire database as the restricting qualifier is on  
the token table, not the document.

What would be a reasonable way to make those two queries use a similar  
execution plan (e.g. token -> document_token -> document instead of  
document -> document_token -> token)?

Thanks in advance.

Kind regards,

--
PA.
http://alt.textdrive.com/nanoki/


[1] http://dev.alt.textdrive.com/browser/HTTP/Finder.ddl
[2] http://dev.alt.textdrive.com/browser/HTTP/Finder.dml#L70

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] query execution plan inconsistency?

Reply via email to