This is more of an academic question as I've come up with a better
query, but I was wondering why SQLite doesn't optimise this query.
Lets say I have two tables, simplified here. One contains webpage
contents and a unique hash of those contents (the primary key), the
other contains a history of lookups. The hash is used as a foreign key
between the two tables.
content_hash -- Primary key
This table has about 10,000 rows
content_hash -- foreign key, has an index
is_json -- a flag indicating if it's JSON
is_html -- a flag indicating if it's HTML
This table has 500,000 rows
(Note: I appreciate this is bad design now and the flags should be in
webpage content; it evolved organically).
Now, I want to query the content table post_processing_info, and join in
the flags from the lookups at the same time. After a bit of fiddling
around to stop duplication happening, I got this:
(SELECT * from lookup_tables.lookups GROUP BY content_xxhash) l
content_hash = 'abc'
This takes about 2 seconds to run, which is quite slow given
content_hash is indexed in both tables and it's running from an SSD.
I appreciate there are better ways to write this (and I'm using one now,
I was a wee bit meandering getting there though), but it I was left
wondering - why did SQLite not optimise the subquery?
The Explain Query Plan for that is:
1 0 0 SCAN TABLE crawling_lookups USING INDEX content_hash_idx
0 0 0 SEARCH TABLE webpage_contents AS c USING INDEX
0 1 1 SCAN SUBQUERY 1 AS l
It seems to be doing the group on the subselect using the index (fast -
about 0.1 seconds), then at step 3 doing a scan of the subquery results
for the content_hash. This would be the slow part as that doesn't have
I figured SQL would spot that the where clause in the outermost scope
would also apply to the subselect and put it in there automatically so
the group-by would only be for the things the WHERE clause affected.
Is this an optimisation opportunity? Or is my SQL so bad it was
inevitable (more likely)?
sqlite-users mailing list