Try the same test using 147 columns in each table.
1 column is rather trivial. Even a kindergarten kid could do it in no time
using crayons and the wall.
And of course the output of INTERSECT is ordered. It uses a sorter to perform
the intersection. And of course the output is distinct, it uses a sorter to
perform the intersection.
In other words,
select ... a bunch of columns ...
select ... an eual number of bunch of columns ...
is equivalent to
select ... the bunch of columns ...
where exists (select * from table2
where (for each column position in table 2 equals that column
position from table1, plus of course all the added stuff needed to handle
group by ... the bunch of columns ...;
In other words except in very trivial cases (like having only one column that
is not nullable) it will be very difficult to write a "correct" JOIN or
correlated subquery that emulates an INTERSECT.
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of R Smith
>Sent: Wednesday, 6 September, 2017 14:58
>Subject: Re: [sqlite] JOIN vs. INTERSECT speed and efficiency
>On 2017/09/06 8:26 PM, Nico Williams wrote:
>> On Wed, Sep 06, 2017 at 08:05:57PM +0200, R Smith wrote:
>>> -- Another interesting thing to note: The INTERSECT test produces
>>> -- output, which suggests that an ORDER-BY addition to the query
>>> -- favour the INTERSECT method.
>> Nothing about INTERSECT requires it to produce ordered output.
>No, and it was not suggested, it was just noted that it does,
>that it could be the more performant choice when adding an ORDER BY
>clause, which turned out to not only be true in terms of being the
>better choice, but also that it itself sped up by simply adding the
>ORDER BY clause as was demonstrated in Test 6.
>> Nothing about the JOIN case makes it not possible to produce
>> output by accident.
>Yet it doesn't seem to by accident, which would suggest that an ORDER
>clause when added to the JOIN statements would incur an additional
>penalty for having to actually order the results - Yet, as again
>demonstrated in Test 6, the ORDER BY actually sped up the JOIN query
>(perhaps via forcing the Index earlier or used in a different way) -
>which was most interesting, and, as you noted, there is nothing about
>the JOIN that precludes it from having ordered output, so this
>optimization might be worthwhile.
>> You'll want to re-measure with an ORDER BY added.
>I did. It was done in Test 6. It showed significantly interesting
>results. Was my explanation lacking in clarity or did it fall down
>TLDR; rabbit hole? :)
>> In any case, this is quite interesting. Many uses of JOIN are not
>> merely to filter results, but to construct joined result rows --
>> uses of JOIN cannot be optimized by using INTERSECT. But for
>> filter-uses of JOIN... this might be a useful optimization for the
>> engine to learn.
>I agree, and not only the INTERSECT optimization but the tests
>adding a silent ORDER BY would also be an optimization, though not
>if the effort-to-pleasure ratio is low enough yet. Perhaps if re-
>the tests with tables using several more non-Integer columns to see
>the optimization could be generalized across all kinds of data in
>way. I might pursue this later when I have some time.
>sqlite-users mailing list
sqlite-users mailing list