On Thu, 2014-10-09 at 17:41 +0700, Dan Kennedy wrote:
> I think I understand the second optimization. You're saying that given this:
> 
>    SELECT DISTINCT main.uid
>           FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>           WHERE email_list.email LIKE 'al%'
> 
> SQLite should deduce that since the LIKE implies "email_list.email IS
> NOT NULL" 

(
    Digression: SQLite *should* deduce the LIKE implies the IS NOT NULL
    condition. My *actual* queries all have, for some bizarre
    historical reason I haven't yet explored,

        email_list.value IS NOT NULL AND email_list.value LIKE 'foo%'

    You'd think that the former condition would end up optimised away,
    but timings show that it clearly isn't. I wasn't going to bring that
    up here since I think it is so clearly a stupid thing to be putting
    in our query in the first place. But you just made me do it anyway.
)

> the LEFT JOIN is equivalent to a regular JOIN. Which would allow
> SQLite to reorder the tables and perhaps come up with a more efficient
> query plan. Correct?

Indeed.

As I said, a glib response to this suggested optimisation in *isolation*
might be that the user shouldn't have asked for a LEFT JOIN at all if
they didn't need it. But of course when coupled with the other suggested
transforms, that glib response is a lot less valid.

> 
> The first optimization might be trickier.
 ...
>  I guess queries that have a DISTINCT clause don't have this problem.

Right. I think it only works with DISTINCT, but that's OK.

>  So it could in theory transform your query to:
> 
>    SELECT DISTINCT
>         main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>         WHERE email_list.email LIKE 'al%'
>      UNION ALL
>    SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>         WHERE main.first_name like 'al%'
>      UNION ALL
>    SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>         WHERE main.last_name like 'al%';
> 
> The the hypothetical optimization above could figure out that the
> first LEFT JOIN could just as easily be a JOIN.

Right.

> And that the other two LEFT JOINs are not required at all due to the
> DISTINCT and the fact that the WHERE clauses on the sub-selects do not
> reference the joined table at all.

Yeah, that's probably true. I'd assumed that we'd do it as part of the
"first optimisation" introducing the UNION but it's probably better
handled as a separate transform as you have it here.

> It seems like there are a few moving parts here. And none of these are
> trivial changes. So, good ideas that might show up in an SQLite
> release at some point, but it's not realistic to expect these
> optimizations to be implemented quickly.

TBH that's all I was really looking for; thanks. I *hate* it when people
silently work around limitations in my software without bringing them to
my attention, and I didn't want to be guilty of doing the same.

I certainly don't expect an instant fix; just knowing that it's on the
radar is perfectly sufficient. Now I can commit my hackish workaround to
rewrite certain special-case queries to optimise the output of the
current query planner for them, and still sleep at night :)

Thanks.

>  Unless, of course, you can propose a patch and they turn out to be
> simpler than they look.

That would be interesting but realistically it's outside my capacity at
the moment. I am already *so* far into shaving this yak that I can
barely remember what I was actually intending to do in the first
place :)

-- 
David Woodhouse                            Open Source Technology Centre
david.woodho...@intel.com                              Intel Corporation
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to