Re: [sqlite] Feature request: merge joins (preferably through a hint)

2017-11-07 Thread Simon Slavin
On 7 Nov 2017, at 7:59am, Davor Josipovic  wrote:

> What sqlite does now is for each "a" it searches through the index for "x".

If an ideal index already exists, accessing the correct records will be fast.  
If one does not exist, how would you expect a merge join to be any faster ?

There are specific cases where a merge join is faster than using JOIN … ORDER 
BY.  For that to happen, both source tables must already have indexes ideally 
suited to the merge join, and the rows which you’re going to want returned must 
be a very large proportion of both source tables, probably the whole tables.  
Also, SQLite has to be aware of those facts, it cannot simply assume them.

Except for the above cases, existing formats will be just as fast, and can be 
far faster, especially in cases where the rows wanted do not represent most of 
the rows of the existing tables.

Merge joins also represent a problem where you have to compare the two 
available rows.  There’s no good way to know what the programmer means by this, 
especially in cases involving many columns and collation methods.  Assumptions 
have to be made and whatever the development team picks is sure to annoy some 
users.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: merge joins (preferably through a hint)

2017-11-07 Thread Davor Josipovic
>  You are thinking that perhaps queries such as the following might
> be faster using a merge:

>
> SELECT * FROM tab1 JOIN tab2 ON tab1.a=tab2.x;

>

> I disagree.


I don't see any reason to disagree. Merge join will definitely be faster if the 
data is already sorted. See the reference: 
https://en.wikipedia.org/wiki/Sort-merge_join. It is a linear time operation.


What sqlite does now is for each "a" it searches through the index for "x". 
This search operation is logarithmic time. If there are many records in tab1, 
then this stacks and becomes quasilinear time. I experience this constantly 
with sqlite data wrangling and tab1 and 2 in the millions. sqlite's nested 
loops are very fast, but the joins _could_ be made much faster with merge joins 
in such situations. I just wish I had this hint...
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: merge joins (preferably through a hint)

2017-11-05 Thread Simon Slavin


On 5 Nov 2017, at 11:04am, Richard Hipp  wrote:

> SQLite does do a merge in some cases, though not for what you would
> traditionally call a join.  For example, SQLite will do a merge to
> combine the two halves of this query:
> 
>SELECT a,b,c FROM tab1 UNION SELECT x,y,z FROM tab2 ORDER BY 1,2,3;

In case it’s not clear from the above, SQL processes the UNION before the ORDER 
BY clause.   If "merge join" didn’t already have a definition, we could use the 
term for that.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: merge joins (preferably through a hint)

2017-11-05 Thread Richard Hipp
On 11/5/17, Davor Josipovic  wrote:
> Merge joins could be an incredible optimization in some cases for large
> queries and would make sqlite much faster in such cases.

SQLite does do a merge in some cases, though not for what you would
traditionally call a join.  For example, SQLite will do a merge to
combine the two halves of this query:

SELECT a,b,c FROM tab1 UNION SELECT x,y,z FROM tab2 ORDER BY 1,2,3;

You are thinking that perhaps queries such as the following might be
faster using a merge:

SELECT * FROM tab1 JOIN tab2 ON tab1.a=tab2.x;

I disagree.  In order to do this as a merge, we'd need indexes on both
tab1.a and tab2.x.  (In order for the merge to be practical, and not
require an arbitrary amount of auxiliary storage, both indexes would
need to be UNIQUE.)  But if you already either one of those two
indexes, then the nested-loop join will already be blazing fast.  It
is difficult to see how switching to a merge join would make it go any
faster.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users