Would it not be more efficient to skip the join altogether since all you want 
is the list of uid's, and assuming that you have maintained the referential 
integrity of your database mail_list(list_uid) references main(uid)?

SELECT list_uid 
  FROM mail_list
 WHERE email LIKE 'al%'
UNION 
SELECT uid
  FROM main 
 WHERE first_name LIKE 'al%'
    OR last_name LIKE 'al%';

>-----Original Message-----
>Speaking of better query plans, though... here's a query which takes
>about 1700ms on my data set, followed by a couple of optimisations which
>seem like they might be generically useful, and would help it go 300
>times faster:
>
>/* This is an address book. The 'main' table holds the primary
>   information while the 'email_list' table holds email addresses
>   because each contact can have more than one email address */
>PRAGMA case_sensitive_like=on;
>CREATE TABLE main (uid text primary key, first_name text, last_name
>text);
>INSERT INTO main VALUES('aaaa','alice', 'X');
>INSERT INTO main VALUES('bbbb','bert', 'Y');
>INSERT INTO main VALUES('cccc','alison', 'Z');
>CREATE INDEX main_first_name_index on main (first_name);
>CREATE INDEX main_last_name_index on main (last_name);
>CREATE TABLE email_list (uid text, email text);
>INSERT INTO email_list VALUES('aaaa','al...@example.com');
>INSERT INTO email_list VALUES('aaaa','ali...@example.com');
>INSERT INTO email_list VALUES('bbbb','alb...@example.org');
>CREATE INDEX email_uid_index on email_list (uid);
>CREATE INDEX email_email_index on email_list (email);
>
>SELECT DISTINCT main.uid
>       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>       WHERE email_list.email LIKE 'al%'
>       OR main.first_name like 'al%'
>       OR main.last_name like 'al%';
>/*
>0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1
>0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
>*/
>
>The first optimisation is to realise that sometimes, operations on the
>result of a JOIN can be more efficiently resolved by using on the
>original tables. Specifically, in this example, there's no point in
>looking for a given first_name or last_name in the *joined* table, when
>we have indices on those fields in the original 'main' table and can do
>it much faster by querying those and then combining the results.
>
>So the original query becomes:
>
>SELECT DISTINCT main.uid
>       FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>       WHERE email_list.email LIKE 'al%'
>UNION SELECT main.uid
>       FROM main WHERE main.first_name LIKE 'al%'
>       OR main.last_name LIKE 'al%';
>/*
>1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1
>1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
>2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>?
>AND first_name<?)
>2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND
>last_name<?)
>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
>*/
>
>With that optimisation, the time taken on my data set is reduced to
>about 1000ms. Which is quite a nice improvement from 1700ms but that's
>just the first step...
>
>The second optimisation is to realise that there's no point in that
>being a LEFT JOIN any more, given that it's now only ever going to
>return results where the right-hand-side actually does exist. Perhaps
>normally we'd blame the user for such a naïve query, but when coupled
>with the first optimisation, spotting this optimisation does actually
>start to make sense. So now the query becomes:
>
>SELECT DISTINCT main.uid
>       FROM main JOIN email_list ON main.uid = email_list.uid
>       WHERE email_list.email LIKE 'al%'
>UNION SELECT main.uid
>       FROM main WHERE main.first_name LIKE 'al%'
>       OR main.last_name LIKE 'al%';
>/*
>1|0|1|SEARCH TABLE email_list USING INDEX email_email_index (email>? AND
>email<?)
>1|1|0|SEARCH TABLE main USING COVERING INDEX sqlite_autoindex_main_1
>(uid=?)
>2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>?
>AND first_name<?)
>2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND
>last_name<?)
>0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
>*/
>
>Now the query takes 6ms on my data set.
>
>If this were a hard-coded query, of course it would make sense for me to
>just do it this way in the client. But in my case — and I suspect it's
>this way in a number of other real-world examples — we're actually just
>representing a set of user-input search criteria in the SQL query. To
>make this optimisation in the client would be distinctly non-trivial,
>and it would be *really* nice if the sqlite query planner could do it.
>
>Is that feasible?
>
>--
>dwmw2



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to