On 10/09/2014 04:38 PM, David Woodhouse wrote:
On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote:
I suggested a couple of specific optimisations which the query planner
might be able to make, which should hopefully have benefits wider than
just my own use case. Are those not viable?
I'm preparing to commit a workaround to Evolution to avoid this issue,
and then move on with my life and forget about it.

Before I do, is it worth me rephrasing this as a 'suboptimal query plan'
bug report so it gets tracked and might get attention later? Or should I
just forget the idea of getting it fixed in sqlite?


Well, you could always create a patch...

I think I understand the second optimization. You're saying that given this:

  SELECT DISTINCT main.uid
         FROM main LEFT JOIN email_list ON main.uid = email_list.uid
         WHERE email_list.email LIKE 'al%'

SQLite should deduce that since the LIKE implies "email_list.email IS NOT NULL" 
the LEFT JOIN is equivalent to a regular JOIN. Which would allow SQLite to reorder the 
tables and perhaps come up with a more efficient query plan. Correct?

Seems like a reasonable idea.

The first optimization might be trickier. With queries that feature a single 
table:

  SELECT cols FROM tbl WHERE a=? OR b=? OR c=?

the planner may elect to run something very close to this:

  SELECT cols FROM tbl WHERE a=?
    UNION ALL
  SELECT cols FROM tbl WHERE b=?
    UNION ALL
  SELECT cols FROM tbl WHERE c=?

However, after returning each row, we remember its PRIMARY KEY (either the 
rowid or real PK for WITHOUT ROWID tables). Similar transformations for 
individual loops within join queries are also possible.

However, with a JOIN query, we don't currently attempt this kind of transform. 
I think because we would have to create some kind of composite key to use in 
place of the PRIMARY KEY to avoid returning duplicates. I guess queries that 
have a DISTINCT clause don't have this problem. So it could in theory transform 
your query to:

  SELECT DISTINCT
       main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE email_list.email LIKE 'al%'
    UNION ALL
  SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE main.first_name like 'al%'
    UNION ALL
  SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
       WHERE main.last_name like 'al%';

The the hypothetical optimization above could figure out that the first LEFT 
JOIN could just as easily be a JOIN.

And that the other two LEFT JOINs are not required at all due to the DISTINCT 
and the fact that the WHERE clauses on the sub-selects do not reference the 
joined table at all.

It seems like there are a few moving parts here. And none of these are trivial 
changes. So, good ideas that might show up in an SQLite release at some point, 
but it's not realistic to expect these optimizations to be implemented quickly. 
Unless, of course, you can propose a patch and they turn out to be simpler than 
they look.

Regards,
Dan.






_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to