Hi, Everyone,

I tried to use btree but I only noticed a slight increase of
performance. I guess this module is better suited for more complex
boundaries and does not really help one-dimensional searches that I am
running. Because I am not sure if btree works for all python/sqlite
installations, I eventually gave up this approach.

I also tried many combinations of join, where, indexes and order of
tables. They all yield similar performance after I ANALYZE the tables.
This again proves sqlite's ability to use the best strategies for a
query, regardless of how it is written.

I also tried Don V Nielsen's method to expand the ranges, but this
does not work either because my ranges are large and there will be
billions of positions inside these ranges.

However, the last attempt motivated me to use a binning method. More
specifically, I

1. add a bin field to tableA to let bin=pos/10000, which groups
positions in bins of size 10000.

2. add a separate tableC with (bin INT, range_id INT) to store all
bins each range covers. More specifically, for each tange in tableB,
sbin=start/10000, ebin=end/10000, I insert sbin, sbin+1, sbin+2,...,
ebin+1 to tableC with the rowid of the range.

3. Index bin and range_id of tableC.

4. Change the query from

INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND end;

to

INSERT INTO ids SELECT id FROM tableA, tableB, tableC WHERE
tableA.bin=tableC.bin AND tableC.range_id=tableB.rowid AND pos BETWEEN
start AND end;

Using this method, a query that needed more than 40min to execute
could now finish in 2 minutes. The major change here is that for each
position, tableC contains a small number of ranges that might cover
it. Instead of searching all ranges, the new query only searches
ranges specified in tableC, which significantly boost the performance.

I am very happy that I finally find a solution for my problem and I
appreciate all the responses from the list. Thank you!

Bo

On Tue, Dec 27, 2011 at 11:57 AM, Bo Peng <ben....@gmail.com> wrote:
> I will report back my if I can use this module
> to optimize my query.
>
> Thanks,
> Bo
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to