Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-18 Thread Poor Yorick
On Wed, Apr 17, 2019 at 11:43:13AM -0600, Keith Medcalf wrote:
> 
> Your made up plans are intriguing.  The plan you show for the latter query 
> omit to join a and b.  Are you just making things up?

The query plans were cut and pasted from the terminal.  It's easy enough to
deduce where these plans came from:  As someone else pointed out, the ",b" in
the second query shouldn't be there, so I removed it before generating the
query plans.  That step of the query plan is irrelevant anyway.  The point is
that in the subselect variant the query the planner chooses this 


  
  7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR} 

  

which, given the conditions, is a far better choise than what the planner
chooses in the "join" variant: 


  
  8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)}

  

It would be easy enough again for the planner to deduce this, but as Ryan
Smith described, may not be worth doing in the general case.  I don't know.
I'm just reporting in from the field.


-- 
Poor Yorick

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Keith Medcalf

Your made up plans are intriguing.  The plan you show for the latter query omit 
to join a and b.  Are you just making things up?

sqlite> select a.rowid from a, b where a.ref=7 and a.rowid in (select rowid 
from b);
QUERY PLAN
|--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows)
|--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR
`--SCAN TABLE b (~1048576 rows)

However, assuming that you typo'd the query (please learn about this 
new-fangled thing called cut-n-paste to avoid that error in the future) you get 
this if you specify "FROM a" rather than "FROM a, b".

sqlite> select rowid from a where ref=7 and rowid in (select rowid from b);
QUERY PLAN
|--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows)
`--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR

In any case, upon closer examination the VDBE code for the two queries "solves" 
the query quite differently:

select a.rowid from a, b where a.rowid == b.rowid and a.ref == ?

places the index constrained by ref = ? in the outer loop, and then if a.rowid 
exists in table b, returns a result row.  This means that the number of times 
the outer loop is executed is dependent on the constraint ref == ? on the 
index.  Since there can be no possible statistic for the a.rowid == b.rowid 
condition, table a (or rather the index) will and must always be in the outer 
loop, even if the constraint ref == ? selects all rows in table a and there is 
only one row in table b ...

select rowid from a where ref == ? and rowid in (select rowid from b)

however scans table b in the outer loop and then does an index lookup on (ref, 
rowid) into the index and returns a result row whenever it is found.  This 
means that the number of times the outer loop is executed is dependent on the 
number of rows in table b (only).  Since the QP can determine the number of 
rows in a where ref == 7 and the number of rows in b, the plan will probably be 
optimized by having statistics available.  Do you have statistics available?  
Or do you just by happenstance have less rows in b than in a constrained by ref 
== ?.  Do you have statistics?  Have you run ANALYZE?

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Poor Yorick
>Sent: Wednesday, 17 April, 2019 08:48
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect
>
>On Wed, Apr 17, 2019 at 01:24:11PM +, David Raymond wrote:
>> Would you post what those explain query plans results are? All the
>other replies not withstanding I'm still curious as to why #2 would
>be faster (assuming "rowid" is indeed the actual rowid anyway)
>>
>> Also, is that a typo in #2, if you're not using b, why would you
>include it in the from clause? Wouldn't that introduce a whole bunch
>of duplicates? As in a copy of a.rowid for every single record in b?
>(Maybe my brain just hasn't finished warming up this morning)
>>
>> #1
>> select a.rowid
>> from a join b on a.rowid = b.rowid
>> where a.ref = $x
>>
>> #2
>> select a.rowid
>> from a,b
>> where a.ref = $x and a.rowid in (select rowid from b)
>>
>>
>
>3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)}
>8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)}
>
>2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND
>rowid=?)}
>7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR}
>
>--
>Poor Yorick
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Poor Yorick
On Wed, Apr 17, 2019 at 01:24:11PM +, David Raymond wrote:
> Would you post what those explain query plans results are? All the other 
> replies not withstanding I'm still curious as to why #2 would be faster 
> (assuming "rowid" is indeed the actual rowid anyway)
> 
> Also, is that a typo in #2, if you're not using b, why would you include it 
> in the from clause? Wouldn't that introduce a whole bunch of duplicates? As 
> in a copy of a.rowid for every single record in b? (Maybe my brain just 
> hasn't finished warming up this morning)
> 
> #1
> select a.rowid
> from a join b on a.rowid = b.rowid
> where a.ref = $x 
> 
> #2
> select a.rowid
> from a,b
> where a.ref = $x and a.rowid in (select rowid from b)
> 
> 

3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)}
8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)}

2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND rowid=?)}
7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR}

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread David Raymond
Would you post what those explain query plans results are? All the other 
replies not withstanding I'm still curious as to why #2 would be faster 
(assuming "rowid" is indeed the actual rowid anyway)

Also, is that a typo in #2, if you're not using b, why would you include it in 
the from clause? Wouldn't that introduce a whole bunch of duplicates? As in a 
copy of a.rowid for every single record in b? (Maybe my brain just hasn't 
finished warming up this morning)

#1
select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x 

#2
select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)



-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Poor Yorick
Sent: Wednesday, April 17, 2019 6:32 AM
To: SQLite mailing list
Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect


On Wed, Apr 17, 2019 at 10:15:31AM +, Hick Gunter wrote:
> Try EXPLAIN QUERY PLAN  or even EXPLAIN  to see what is going on 
> in each case.


I already have, of course.  The question is, how much effort would it be to get
sqlite choose the better query plan in the "join" case as well?

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Poor Yorick

On Wed, Apr 17, 2019 at 10:15:31AM +, Hick Gunter wrote:
> Try EXPLAIN QUERY PLAN  or even EXPLAIN  to see what is going on 
> in each case.


I already have, of course.  The question is, how much effort would it be to get
sqlite choose the better query plan in the "join" case as well?

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Hick Gunter
Try EXPLAIN QUERY PLAN  or even EXPLAIN  to see what is going on in 
each case.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Poor Yorick
Gesendet: Mittwoch, 17. April 2019 10:56
An: sqlite-users@mailinglists.sqlite.org
Betreff: [EXTERNAL] [sqlite] slow join, fast subselect

I've used the following two test queries in a version of sqlite built against a 
recent checkout of trunk, and also another recent version of sqlite.  a.ref is 
indexed.  The subselect query is faster than the join query -- two orders of 
magnitude faster on a larger dataset.  Is sqlite missing some easy optimisation 
opportunity here?


select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x


select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)


--
Poor Yorick

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users