On 10/09/2014 04:38 PM, David Woodhouse wrote:
On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote:
I suggested a couple of specific optimisations which the query planner
might be able to make, which should hopefully have benefits wider than
just my own use case. Are those not viable?
I'm preparing to commit a workaround to Evolution to avoid this issue,
and then move on with my life and forget about it.
Before I do, is it worth me rephrasing this as a 'suboptimal query plan'
bug report so it gets tracked and might get attention later? Or should I
just forget the idea of getting it fixed in sqlite?
Well, you could always create a patch...
I think I understand the second optimization. You're saying that given this:
SELECT DISTINCT main.uid
FROM main LEFT JOIN email_list ON main.uid = email_list.uid
WHERE email_list.email LIKE 'al%'
SQLite should deduce that since the LIKE implies "email_list.email IS NOT NULL"
the LEFT JOIN is equivalent to a regular JOIN. Which would allow SQLite to reorder the
tables and perhaps come up with a more efficient query plan. Correct?
Seems like a reasonable idea.
The first optimization might be trickier. With queries that feature a single
table:
SELECT cols FROM tbl WHERE a=? OR b=? OR c=?
the planner may elect to run something very close to this:
SELECT cols FROM tbl WHERE a=?
UNION ALL
SELECT cols FROM tbl WHERE b=?
UNION ALL
SELECT cols FROM tbl WHERE c=?
However, after returning each row, we remember its PRIMARY KEY (either the
rowid or real PK for WITHOUT ROWID tables). Similar transformations for
individual loops within join queries are also possible.
However, with a JOIN query, we don't currently attempt this kind of transform.
I think because we would have to create some kind of composite key to use in
place of the PRIMARY KEY to avoid returning duplicates. I guess queries that
have a DISTINCT clause don't have this problem. So it could in theory transform
your query to:
SELECT DISTINCT
main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
WHERE email_list.email LIKE 'al%'
UNION ALL
SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
WHERE main.first_name like 'al%'
UNION ALL
SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
WHERE main.last_name like 'al%';
The the hypothetical optimization above could figure out that the first LEFT
JOIN could just as easily be a JOIN.
And that the other two LEFT JOINs are not required at all due to the DISTINCT
and the fact that the WHERE clauses on the sub-selects do not reference the
joined table at all.
It seems like there are a few moving parts here. And none of these are trivial
changes. So, good ideas that might show up in an SQLite release at some point,
but it's not realistic to expect these optimizations to be implemented quickly.
Unless, of course, you can propose a patch and they turn out to be simpler than
they look.
Regards,
Dan.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users