>> Try using a JOIN instead. In fact, try both ways around: >> >> DELETE FROM ids; >> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN >> start AND end; >> >> then try >> >> DELETE FROM ids; >> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN >> start AND end; >> >> Which one is faster depends on some aspects about your data and it's easier >> for you to test it than for me to guess. > > If these two don't behave identically to each other and don't behave > identically to the original query then there's bug in SQLite. With > inner join it shouldn't matter for optimizer which form your query is > written in.
Thank everyone for your quick replies. I trimmed down my tables to have 10,000 and 1000 rows respectively, and re-create a test database test1_fresh.DB WITHOUT analyze it. bpeng@bp8:annoDB % cp test1_fresh.db test1.db bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select tableA.id from tableA join tableB on pos between start and end;' real 0m28.026s user 0m27.994s sys 0m0.016s bpeng@bp8:annoDB % sqlite3 test1.db 'select count(*) from ids;' 32486 bpeng@bp8:annoDB % cp test1_fresh.db test1.db bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select tableA.id from tableB join tableA on pos between start and end;' real 0m0.085s user 0m0.061s sys 0m0.010s bpeng@bp8:annoDB % sqlite3 test1.db 'select count(*) from ids;' 32486 To my surprise, the order of join has a significant impact on the performance of query. The explain command shows why: bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into ids select tableA.id from tableA join tableB on pos between start and end;' 0|0|0|SCAN TABLE tableA (~1000000 rows) 0|1|1|SEARCH TABLE tableB USING COVERING INDEX tableB_idx (start<?) (~125000 rows) bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into ids select tableA.id from tableB join tableA on pos between start and end;' 0|0|0|SCAN TABLE tableB (~1000000 rows) 0|1|1|SEARCH TABLE tableA USING INDEX tableA_idx (pos>? AND pos<?) (~30000 rows) A simple 'analyze' command seems to be able to fix the problem: bpeng@bp8:annoDB % cp test1_fresh.db test1.db bpeng@bp8:annoDB % sqlite3 test1.db 'analyze' bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select tableA.id from tableA join tableB on pos between start and end;' real 0m0.086s user 0m0.062s sys 0m0.011s Anyway, 'from tableA join tableB' and 'from tableA, tableB' seem to have the same performance: bpeng@bp8:annoDB % cp test1_fresh.db test1.db bpeng@bp8:annoDB % sqlite3 test1.db 'analyze' bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select tableA.id from tableA, tableB where pos between start and end;' real 0m0.086s user 0m0.062s sys 0m0.010s bpeng@bp8:annoDB % cp test1_fresh.db test1.db bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select tableA.id from tableA, tableB where pos between start and end;' real 0m27.972s user 0m27.918s sys 0m0.035s Going back to my original problem: my ranges do overlap and I only need distinct IDs. The query I am using is a subquery in my application and I use 'DISTINCT' to select distinct IDs. From the 'explain' output: bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into ids select tableA.id from tableA join tableB on pos between start and end;' 0|0|1|SCAN TABLE tableB (~1000 rows) 0|1|0|SEARCH TABLE tableA USING INDEX tableA_idx (pos>? AND pos<?) (~3000 rows) bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into ids select distinct tableA.id from tableA join tableB on pos between start and end;' 0|0|1|SCAN TABLE tableB (~1000 rows) 0|1|0|SEARCH TABLE tableA USING INDEX tableA_idx (pos>? AND pos<?) (~3000 rows) 0|0|0|USE TEMP B-TREE FOR DISTINCT sqlite selects all ids before it uses B-TREE for DISTINCT. Is there a way to tell sqlite to return an id when it founds the first range that the id falls into? Bo _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users