On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote:
> The 50% faster number above is not about better query plans. 

Speaking of better query plans, though... here's a query which takes
about 1700ms on my data set, followed by a couple of optimisations which
seem like they might be generically useful, and would help it go 300
times faster:

/* This is an address book. The 'main' table holds the primary
   information while the 'email_list' table holds email addresses
   because each contact can have more than one email address */
PRAGMA case_sensitive_like=on;
CREATE TABLE main (uid text primary key, first_name text, last_name text);
INSERT INTO main VALUES('aaaa','alice', 'X');
INSERT INTO main VALUES('bbbb','bert', 'Y');
INSERT INTO main VALUES('cccc','alison', 'Z');
CREATE INDEX main_first_name_index on main (first_name);
CREATE INDEX main_last_name_index on main (last_name);
CREATE TABLE email_list (uid text, email text);
INSERT INTO email_list VALUES('aaaa','al...@example.com');
INSERT INTO email_list VALUES('aaaa','ali...@example.com');
INSERT INTO email_list VALUES('bbbb','alb...@example.org');
CREATE INDEX email_uid_index on email_list (uid);
CREATE INDEX email_email_index on email_list (email);

SELECT DISTINCT main.uid
       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
       OR main.first_name like 'al%'
       OR main.last_name like 'al%';
/*
0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1
0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
*/

The first optimisation is to realise that sometimes, operations on the
result of a JOIN can be more efficiently resolved by using on the
original tables. Specifically, in this example, there's no point in
looking for a given first_name or last_name in the *joined* table, when
we have indices on those fields in the original 'main' table and can do
it much faster by querying those and then combining the results.

So the original query becomes:

SELECT DISTINCT main.uid
       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
UNION SELECT main.uid
       FROM main WHERE main.first_name LIKE 'al%'
       OR main.last_name LIKE 'al%';
/*
1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1
1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? AND 
first_name<?)
2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND 
last_name<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
*/

With that optimisation, the time taken on my data set is reduced to
about 1000ms. Which is quite a nice improvement from 1700ms but that's
just the first step...

The second optimisation is to realise that there's no point in that
being a LEFT JOIN any more, given that it's now only ever going to
return results where the right-hand-side actually does exist. Perhaps
normally we'd blame the user for such a naïve query, but when coupled
with the first optimisation, spotting this optimisation does actually
start to make sense. So now the query becomes:

SELECT DISTINCT main.uid
       FROM main JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
UNION SELECT main.uid
       FROM main WHERE main.first_name LIKE 'al%'
       OR main.last_name LIKE 'al%';
/*
1|0|1|SEARCH TABLE email_list USING INDEX email_email_index (email>? AND 
email<?)
1|1|0|SEARCH TABLE main USING COVERING INDEX sqlite_autoindex_main_1 (uid=?)
2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? AND 
first_name<?)
2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND 
last_name<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
*/

Now the query takes 6ms on my data set.

If this were a hard-coded query, of course it would make sense for me to
just do it this way in the client. But in my case — and I suspect it's
this way in a number of other real-world examples — we're actually just
representing a set of user-input search criteria in the SQL query. To
make this optimisation in the client would be distinctly non-trivial,
and it would be *really* nice if the sqlite query planner could do it.

Is that feasible?

-- 
dwmw2
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to