I’ve got an urgent-to-me issue wherein a customer says that our library has slowed down by several orders of magnitude in iOS 10.3 compared to iOS 9. I ran a test case they provided and found that some queries that should be fast are taking a very long time (~500ms) and generating huge numbers of reads. Using EXPLAIN QUERY PLAN I found that the optimizer has generated a horrifically bad plan, essentially an inside-out join, that results in linear scans of two large tables. (Presumably this is due to some change in the optimizer between the older and newer versions of SQLite in iOS.)
Now I need to figure out how to fix this ASAP. Our library periodically runs ANALYZE, so sqlite might not even start out using the bad query plan; it might happen only with large databases or with some specific data sets. The query, with some extraneous details removed, looks like: SELECT docs.doc_id, revs.sequence, docs.docid, rev.revid, rev.json, […] FROM revs, docs WHERE revs.sequence>? AND revs.current!=0 AND revs.doc_id = docs.doc_id ORDER BY revs.doc_id, revs.deleted, revs.revid DESC The “revs” table has “sequence” as its integer primary key, and a foreign key “doc_id” referencing the “docs” table. (Plus many other columns.) The “docs” table has “doc_id” as its primary key and “docid” as a text column. Basically this query is meant to find the latest revs, including their “docid” strings joined from the “docs” table. It should just be using the “revs” table’s primary index starting from the “?” parameter, then looking up the corresponding doc for each revision. Instead it does this: selectid = 0 order = 0 from = 1 detail = SCAN TABLE docs USING COVERING INDEX docs_docid selectid = 0 order = 1 from = 0 detail = SEARCH TABLE revs USING INDEX revs_current (doc_id=?) selectid = 0 order = 0 from = 0 detail = USE TEMP B-TREE FOR ORDER BY It’s scanning through all 50,000+ rows in “docs”, and then for each document looking up all the rows in “revs” that reference it (“revs” has 175,000 rows), and comparing their sequences to the parameter. (I’ve pasted the relevant parts of the db schema at the bottom of this email if you want to refer to them.) My first priority is to find some way to restate this query such that it will always use the revs.sequence primary key index. A lower priority is to know whether there’s a valid reason the query planner came up with this, or if it’s a bug in the optimizer or the ANALYZE command? In the planner’s defense, it probably doesn’t know that this query is almost always run with the starting sequence (“?”) relatively close to the maximum sequence. If low numbers were frequently given for “?”, it’s possible that its current plan would be more efficient… Thanks, —Jens CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, expiry_timestamp INTEGER); CREATE INDEX docs_docid ON docs(docid); CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL COLLATE REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL, current BOOLEAN, deleted BOOLEAN DEFAULT 0, json BLOB, no_attachments BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid)); CREATE INDEX revs_parent ON revs(parent); CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, deleted); CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc); _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users