Re: [sqlite] Optimizing a query with range comparison.

Bo Peng Tue, 27 Dec 2011 09:17:10 -0800

>> Try using a JOIN instead.  In fact, try both ways around:
>>
>> DELETE FROM ids;
>> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN 
>> start AND end;
>>
>> then try
>>
>> DELETE FROM ids;
>> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN 
>> start AND end;
>>
>> Which one is faster depends on some aspects about your data and it's easier 
>> for you to test it than for me to guess.
>
> If these two don't behave identically to each other and don't behave
> identically to the original query then there's bug in SQLite. With
> inner join it shouldn't matter for optimizer which form your query is
> written in.


Thank everyone for your quick replies. I trimmed down my tables to
have 10,000 and 1000 rows respectively, and re-create a test database
test1_fresh.DB WITHOUT analyze it.

bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableA join tableB on pos between start and end;'

real    0m28.026s
user    0m27.994s
sys     0m0.016s
bpeng@bp8:annoDB % sqlite3 test1.db 'select count(*) from ids;'
32486
bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableB join tableA on pos between start and end;'

real    0m0.085s
user    0m0.061s
sys     0m0.010s
bpeng@bp8:annoDB % sqlite3 test1.db 'select count(*) from ids;'
32486

To my surprise, the order of join has a significant impact on the
performance of query. The explain command shows why:

bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into
ids select tableA.id from tableA join tableB on pos between start and
end;'
0|0|0|SCAN TABLE tableA (~1000000 rows)
0|1|1|SEARCH TABLE tableB USING COVERING INDEX tableB_idx (start<?)
(~125000 rows)
bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into
ids select tableA.id from tableB join tableA on pos between start and
end;'
0|0|0|SCAN TABLE tableB (~1000000 rows)
0|1|1|SEARCH TABLE tableA USING INDEX tableA_idx (pos>? AND pos<?) (~30000 rows)

A simple 'analyze' command seems to be able to fix the problem:

bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % sqlite3 test1.db 'analyze'
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableA join tableB on pos between start and end;'

real    0m0.086s
user    0m0.062s
sys     0m0.011s

Anyway, 'from tableA join tableB' and 'from tableA, tableB' seem to
have the same performance:

bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % sqlite3 test1.db 'analyze'
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableA, tableB where pos between start and end;'

real    0m0.086s
user    0m0.062s
sys     0m0.010s
bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableA, tableB where pos between start and end;'

real    0m27.972s
user    0m27.918s
sys     0m0.035s

Going back to my original problem: my ranges do overlap and I only
need distinct IDs. The query I am using is a subquery in my
application and I use 'DISTINCT' to select distinct IDs. From the
'explain' output:

bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into
ids select tableA.id from tableA join tableB on pos between start and
end;'
0|0|1|SCAN TABLE tableB (~1000 rows)
0|1|0|SEARCH TABLE tableA USING INDEX tableA_idx (pos>? AND pos<?) (~3000 rows)
bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into
ids select distinct tableA.id from tableA join tableB on pos between
start and end;'
0|0|1|SCAN TABLE tableB (~1000 rows)
0|1|0|SEARCH TABLE tableA USING INDEX tableA_idx (pos>? AND pos<?) (~3000 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT

sqlite selects all ids before it uses B-TREE for DISTINCT. Is there a
way to tell sqlite to return an id when it founds the first range that
the id falls into?

Bo
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Optimizing a query with range comparison.

Reply via email to