Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

Keith Medcalf Wed, 06 Sep 2017 18:43:58 -0700

Try the same test using 147 columns in each table.

1 column is rather trivial.  Even a kindergarten kid could do it in no time 
using crayons and the wall.


And of course the output of INTERSECT is ordered.  It uses a sorter to perform 
the intersection.  And of course the output is distinct, it uses a sorter to 
perform the intersection.

In other words,

select ... a bunch of columns ...
from table1
intersect
select ... an eual number of bunch of columns ...
from table2

is equivalent to

select ... the bunch of columns ...
  from table1
 where exists (select * from table2
                where (for each column position in table 2 equals that column 
position from table1, plus of course all the added stuff needed to handle 
nullable columns))
group by ... the bunch of columns ...;

In other words except in very trivial cases (like having only one column that 
is not nullable) it will be very difficult to write a "correct" JOIN or 
correlated subquery that emulates an INTERSECT.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[email protected]] On Behalf Of R Smith
>Sent: Wednesday, 6 September, 2017 14:58
>To: [email protected]
>Subject: Re: [sqlite] JOIN vs. INTERSECT speed and efficiency
>
>On 2017/09/06 8:26 PM, Nico Williams wrote:
>> On Wed, Sep 06, 2017 at 08:05:57PM +0200, R Smith wrote:
>>> -- Another interesting thing to note: The INTERSECT test produces
>ORDERED
>>> -- output, which suggests that an ORDER-BY addition to the query
>would
>>> -- favour the INTERSECT method.
>> Nothing about INTERSECT requires it to produce ordered output.
>
>No, and it was not suggested, it was just noted that it does,
>suggesting
>that it could be the more performant choice when adding an ORDER BY
>clause, which turned out to not only be true in terms of being the
>better choice, but also that it itself sped up by simply adding the
>ORDER BY clause as was demonstrated in Test 6.
>
>> Nothing about the JOIN case makes it not possible to produce
>ordered
>> output by accident.
>
>Yet it doesn't seem to by accident, which would suggest that an ORDER
>BY
>clause when added to the JOIN statements would incur an additional
>time
>penalty for having to actually order the results - Yet, as again
>demonstrated in Test 6, the ORDER BY actually sped up the JOIN query
>too
>(perhaps via forcing the Index earlier or used in a different way) -
>which was most interesting, and, as you noted, there is nothing about
>the JOIN that precludes it from having ordered output, so this
>optimization might be worthwhile.
>
>> You'll want to re-measure with an ORDER BY added.
>
>I did. It was done in Test 6. It showed significantly interesting
>results. Was my explanation lacking in clarity or did it fall down
>the
>TLDR; rabbit hole? :)
>
>
>> In any case, this is quite interesting.  Many uses of JOIN are not
>> merely to filter results, but to construct joined result rows --
>such
>> uses of JOIN cannot be optimized by using INTERSECT.  But for
>> filter-uses of JOIN... this might be a useful optimization for the
>> engine to learn.
>
>I agree, and not only the INTERSECT optimization but the tests
>suggest
>adding a silent ORDER BY would also be an optimization, though not
>sure
>if the effort-to-pleasure ratio is low enough yet. Perhaps if re-
>doing
>the tests with tables using several more non-Integer columns to see
>if
>the optimization could be generalized across all kinds of data in
>some
>way. I might pursue this later when I have some time.
>
>
>Cheers,
>Ryan
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

Reply via email to