[sqlite] Why doesn't SQLite optimise this subselect?

Jonathan Moules Sat, 06 Jan 2018 07:23:17 -0800

Hi All,

This is more of an academic question as I've come up with a betterquery, but I was wondering why SQLite doesn't optimise this query.

Lets say I have two tables, simplified here. One contains webpagecontents and a unique hash of those contents (the primary key), theother contains a history of lookups. The hash is used as a foreign keybetween the two tables.


Table: webpage_contents
content_hash -- Primary key
post_processing_info
page_content

This table has about 10,000 rows

crawling_lookups
content_hash -- foreign key, has an index
is_json -- a flag indicating if it's JSON
is_html -- a flag indicating if it's HTML

This table has 500,000 rows

(Note: I appreciate this is bad design now and the flags should be inwebpage content; it evolved organically).

Now, I want to query the content table post_processing_info, and join inthe flags from the lookups at the same time. After a bit of fiddlingaround to stop duplication happening, I got this:


SELECT
    w.post_processing_info,
    l.is_json
FROM
    webpage_contents w
JOIN
    (SELECT * from lookup_tables.lookups GROUP BY content_xxhash) l
    USING (content_hash)
WHERE
    content_hash = 'abc'

This takes about 2 seconds to run, which is quite slow givencontent_hash is indexed in both tables and it's running from an SSD.

I appreciate there are better ways to write this (and I'm using one now,I was a wee bit meandering getting there though), but it I was leftwondering - why did SQLite not optimise the subquery?


The Explain Query Plan for that is:

1    0    0    SCAN TABLE crawling_lookups USING INDEX content_hash_idx

0 0 0 SEARCH TABLE webpage_contents AS c USING INDEXsqlite_autoindex_page_contents_1 (content_hash=?)

0    1    1    SCAN SUBQUERY 1 AS l

It seems to be doing the group on the subselect using the index (fast -about 0.1 seconds), then at step 3 doing a scan of the subquery resultsfor the content_hash. This would be the slow part as that doesn't havean index.I figured SQL would spot that the where clause in the outermost scopewould also apply to the subselect and put it in there automatically sothe group-by would only be for the things the WHERE clause affected.

Is this an optimisation opportunity? Or is my SQL so bad it wasinevitable (more likely)?


Cheers,
Jonathan

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Why doesn't SQLite optimise this subselect?

Reply via email to