On Nov 8, 2008, at 7:34 AM, BareFeet wrote:

>>> I'd like to fetch all the rows in "code" that don't exist in
>>> "companies":
>>>
>>> SELECT code.id FROM code,companies WHERE code.id IS NOT IN (SELECT
>>> code FROM companies);
>>
>> select code.id from code
>> where code.id not in (select code from companies);
>
> A technically better (eg faster) solution is to use "exists" (which
> only looks for the first instance) rather than "in" (which scans the
> whole table unnecessarily):
>
> select Code.ID from Code
> where not exists (select 1 from Companies where Companies.Code =
> Code.ID)
> ;

That is a subjective statement. It could be true if there are a
lot more rows in table 'Companies' than there are in 'Code' and there
is no index on Companies.Code. In general I think the NOT IN(...)
formulation will be faster though.

If there is an index on Companies.Code, then the two methods are more
or less equivalent. Otherwise, the NOT IN(...) method builds a transient
index so that testing "X NOT IN (...)" is an O(log(N)) operation. The
NOT EXISTS test will be O(N).

If there are a lot of rows in table 'Code' the test will need to be
run a lot of times so the expense of building the transient index
is probably worth it. If there are few rows in 'Code' then the test
will only have to run a few times. In this case the cost of building
the index might outweigh the benefits of it.

As always, if performance really matters, test both methods.

Dan.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to