I’ve got an urgent-to-me issue wherein a customer says that our library has 
slowed down by several orders of magnitude in iOS 10.3 compared to iOS 9. I ran 
a test case they provided and found that some queries that should be fast are 
taking a very long time (~500ms) and generating huge numbers of reads. Using 
EXPLAIN QUERY PLAN I found that the optimizer has generated a horrifically bad 
plan, essentially an inside-out join, that results in linear scans of two large 
tables. (Presumably this is due to some change in the optimizer between the 
older and newer versions of SQLite in iOS.)

Now I need to figure out how to fix this ASAP. Our library periodically runs 
ANALYZE, so sqlite might not even start out using the bad query plan; it might 
happen only with large databases or with some specific data sets.

The query, with some extraneous details removed, looks like:
        SELECT docs.doc_id, revs.sequence, docs.docid, rev.revid, rev.json, […] 
FROM revs, docs 
        WHERE revs.sequence>? AND revs.current!=0 AND revs.doc_id = docs.doc_id 
        ORDER BY revs.doc_id, revs.deleted, revs.revid DESC

The “revs” table has “sequence” as its integer primary key, and a foreign key 
“doc_id” referencing the “docs” table. (Plus many other columns.)
The “docs” table has “doc_id” as its primary key and “docid” as a text column.

Basically this query is meant to find the latest revs, including their “docid” 
strings joined from the “docs” table. It should just be using the “revs” 
table’s primary index starting from the “?” parameter, then looking up the 
corresponding doc for each revision. Instead it does this:

selectid = 0
   order = 0
    from = 1
  detail = SCAN TABLE docs USING COVERING INDEX docs_docid

selectid = 0
   order = 1
    from = 0
  detail = SEARCH TABLE revs USING INDEX revs_current (doc_id=?)

selectid = 0
   order = 0
    from = 0
  detail = USE TEMP B-TREE FOR ORDER BY

It’s scanning through all 50,000+ rows in “docs”, and then for each document 
looking up all the rows in “revs” that reference it (“revs” has 175,000 rows), 
and comparing their sequences to the parameter.

(I’ve pasted the relevant parts of the db schema at the bottom of this email if 
you want to refer to them.)

My first priority is to find some way to restate this query such that it will 
always use the revs.sequence primary key index.

A lower priority is to know whether there’s a valid reason the query planner 
came up with this, or if it’s a bug in the optimizer or the ANALYZE command? In 
the planner’s defense, it probably doesn’t know that this query is almost 
always run with the starting sequence (“?”) relatively close to the maximum 
sequence.  If low numbers were frequently given for “?”, it’s possible that its 
current plan would be more efficient…

Thanks,

—Jens

CREATE TABLE docs (doc_id INTEGER PRIMARY KEY, docid TEXT UNIQUE NOT NULL, 
expiry_timestamp INTEGER);
CREATE INDEX docs_docid ON docs(docid);
CREATE TABLE revs (sequence INTEGER PRIMARY KEY AUTOINCREMENT, doc_id INTEGER 
NOT NULL REFERENCES docs(doc_id) ON DELETE CASCADE, revid TEXT NOT NULL COLLATE 
REVID, parent INTEGER REFERENCES revs(sequence) ON DELETE SET NULL,  current 
BOOLEAN,                deleted BOOLEAN DEFAULT 0, json BLOB, no_attachments 
BOOLEAN, doc_type TEXT, UNIQUE (doc_id, revid));
CREATE INDEX revs_parent ON revs(parent);
CREATE INDEX revs_by_docid_revid ON revs(doc_id, revid desc, current, deleted);
CREATE INDEX revs_current ON revs(doc_id, current desc, deleted, revid desc);

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to