[sqlite] JOIN vs IN

2011-11-11 Thread Fabian
Suppose I have two tables, and I want to have look for a value in the first
table, but display the columns from the second table. The most obvious way
would be joining them on rowid. But I don't need to SELECT any columns from
the first table, and it's a FTS4 table (which always joins a bit slower
than real tables), so I currently do:

SELECT * FROM table2 WHERE rowid IN (SELECT rowid FROM table1 WHERE amount
 500)

It returns the same results, but it doesn't seem much faster. Is there any
performance difference to be expected from using IN instead of JOIN, or
does SQLite internally rewrite JOIN queries to something similar as IN,
which would explain they perform nearly the same?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs IN

2011-11-11 Thread Pavel Ivanov
On Fri, Nov 11, 2011 at 11:58 AM, Petite Abeille
petite.abei...@gmail.com wrote:
 It returns the same results, but it doesn't seem much faster. Is there any
 performance difference to be expected from using IN instead of JOIN, or
 does SQLite internally rewrite JOIN queries to something similar as IN,
 which would explain they perform nearly the same?

 They should be equivalent in terms of cost. That said, you might want to use 
 the 'exists'  clause instead for the sake of clarity.

No, exists in this case will change query plan significantly and
performance can degrade drastically as a result.

For the original question: it's not that SQLite rewrites JOIN queries
to be as IN. It's just in your particular case both queries can be
executed in the same way: find all rows in table1 with the necessary
amount, for each row look into table2 and find rows with the same
rowid. Query with JOIN however could be executed differently - for
each row in table2 find all rows with the same rowid in table1 and
then check amount in them. SQLite decided that this query plan will be
less effecient.


Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs IN

2011-11-11 Thread Petite Abeille

On Nov 12, 2011, at 1:58 AM, Pavel Ivanov wrote:

 No, exists in this case will change query plan significantly and
 performance can degrade drastically as a result.

Why would that be?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs IN

2011-11-11 Thread Darren Duncan

Fabian wrote:

Suppose I have two tables, and I want to have look for a value in the first
table, but display the columns from the second table. The most obvious way
would be joining them on rowid. But I don't need to SELECT any columns from
the first table, and it's a FTS4 table (which always joins a bit slower
than real tables), so I currently do:

SELECT * FROM table2 WHERE rowid IN (SELECT rowid FROM table1 WHERE amount

500)


It returns the same results, but it doesn't seem much faster. Is there any
performance difference to be expected from using IN instead of JOIN, or
does SQLite internally rewrite JOIN queries to something similar as IN,
which would explain they perform nearly the same?


The logical operation you are doing is a semijoin, filtering table2 by matching 
rows in table1 (if you used NOT IN instead you would be doing an antijoin).


A semijoin is most clearly expressed in SQL using the WHERE clause as you did, 
because the only purpose of table1 is to filter and not to return values from, 
as putting it in FROM would imply.


Now because SQL is bag oriented rather than set oriented, using IN also helps 
because you avoid generating extra duplicates, whereas if you used the join 
method instead, then if any row in one table matched multiple rows in the other 
(because you weren't joining on a (unique) key of both tables), the result could 
have duplicate table2 rows, which probably isn't what you want.


As to your performance question, any good DBMS should make both of your methods 
perform about the same, but that if they aren't the same, the IN version should 
always perform faster than the FROM version because with IN you only ever have 
to look at each row in table2 once; as soon as it finds any match you move on, 
rather than repeating for all possible matches.


Note that semijoins and antijoins are what you have both when you have another 
select after the NOT/IN and when you have a literal list, such as IN (1,2,3).


Note that any WHERE clause that consists just of ANDed equality tests, such as 
the common WHERE foo = 3 is also a trivial case of a semijoin where the table 
you are filtering on has exactly 1 row whose field value is 3, and ostensibly 
such WHERE clauses should also be optimizable.


-- Darren Duncan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs IN

2011-11-11 Thread Pavel Ivanov
On Fri, Nov 11, 2011 at 8:06 PM, Petite Abeille
petite.abei...@gmail.com wrote:
 No, exists in this case will change query plan significantly and
 performance can degrade drastically as a result.

 Why would that be?

How would you rewrite the query using exists? The only thing I have in mind is

SELECT * FROM table2
WHERE exists (
SELECT 1 FROM table1
WHERE amount  500
AND table1.rowid = table2.rowid)

And this query will force SQLite to use the second query plan I talked
about: scan full table2 and for each row search in table1 for rows
with the same rowid and check if it has necessary amount. And this
plan will very likely be slower.


Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users