Re: [sqlite] What is the most efficient way to get the close by numbers?
On Sat, Aug 21, 2010 at 10:45 PM, ve3meowrote: > "Eric Smith" ... >> I haven't used it myself, but I'm pretty sure this is what the R*tree >> module was designed for: > > I have not used it either but was intrigued by your suggestion. Looking into > it, my sense was that it would be advantageous for a 2-dimension or more > search and I think this was borne out by my experiments, albeit with my very > limited knowledge and experience. I hasten to add that maybe my queries were > not the best designed. > > I created an R-tree virtual table with the minimum number of columns: id, > xmin, xmax and populated it with the rowid from Peng's table A and > position-10 in xmin and position+10 in xmax. Thus any value lying between > xmax and xmin is no more than a distance of 10 from the point pointed to in > Table A by id. Indeed, a simple select on a single value between xmax and > xmin returned rows from the virtual table faster than the correspondingly > simple select on the real table with an index on position (something like > 6ms vs. 10 ms). > > However, on the 10,000 row test table on which I reported earlier that Jim's > 'between' query was fastest at ~2.3s, the best I could get by working an > R-tree virtual table into the mix was ~30s, and that with an index on Name > or Name+Position. Without the index... ~60s, not much better than Peng's > original indexless query and way more complicated. > > I think the penalty is in the extra JOIN required - 3 tables instead of 2 - > with the speed advantage on the 'between' constraint being swamped by the > volume of intermediate rows. I don't quite understand why there are 3 tables with R-tree. Would you please show me what query you used? Thank you very much. -- Regards, Peng ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the most efficient way to get the close by numbers?
"Eric Smith" ... > I haven't used it myself, but I'm pretty sure this is what the R*tree > module was designed for: I have not used it either but was intrigued by your suggestion. Looking into it, my sense was that it would be advantageous for a 2-dimension or more search and I think this was borne out by my experiments, albeit with my very limited knowledge and experience. I hasten to add that maybe my queries were not the best designed. I created an R-tree virtual table with the minimum number of columns: id, xmin, xmax and populated it with the rowid from Peng's table A and position-10 in xmin and position+10 in xmax. Thus any value lying between xmax and xmin is no more than a distance of 10 from the point pointed to in Table A by id. Indeed, a simple select on a single value between xmax and xmin returned rows from the virtual table faster than the correspondingly simple select on the real table with an index on position (something like 6ms vs. 10 ms). However, on the 10,000 row test table on which I reported earlier that Jim's 'between' query was fastest at ~2.3s, the best I could get by working an R-tree virtual table into the mix was ~30s, and that with an index on Name or Name+Position. Without the index... ~60s, not much better than Peng's original indexless query and way more complicated. I think the penalty is in the extra JOIN required - 3 tables instead of 2 - with the speed advantage on the 'between' constraint being swamped by the volume of intermediate rows. Tom ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the most efficient way to get the close by numbers?
Thank you! I don't find which sqlite document describes 'between ... and ...'. Would you please point to me where it is documented? On Fri, Aug 20, 2010 at 6:06 PM, Jim Morriswrote: > If there is an index on (name, position) the a where like below might > use it. > > A1.name=A2.name and A2.position between( A1.position - 10, A1.position + 10 ) > > > On 8/20/2010 3:54 PM, Peng Yu wrote: >> Hi, >> >> I have the following code to search for neighboring positions >> (distance<=10). But it is slow for large data set. I'm wondering what >> is the most efficient query for such a search. Note that I don't >> create an index, as I'm not sure what index to create on table A. >> >> $ cat main.sql >> #!/usr/bin/env bash >> >> rm -f main.db >> sqlite3 main.db<> >> create table A (name text, position integer); >> insert into A values('a', 1); >> insert into A values('a', 5); >> insert into A values('a', 21); >> insert into A values('b', 3); >> insert into A values('b', 15); >> insert into A values('b', 19); >> >> .mode column >> .headers on >> .echo on >> select * from A as A1, A as A2 where A1.name=A2.name and >> abs(A1.position - A2.position)<= 10 and A1.position != A2.position; >> >> EOF >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Regards, Peng ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the most efficient way to get the close by numbers?
Peng Yu wrote: > I have the following code to search for neighboring positions > (distance <=10). But it is slow for large data set. I'm wondering what > is the most efficient query for such a search. Note that I don't > create an index, as I'm not sure what index to create on table A. I haven't used it myself, but I'm pretty sure this is what the R*tree module was designed for: http://www.sqlite.org/rtree.html -- Eric A. Smith Drill for oil? You mean drill into the ground to try and find oil? You're crazy. -- Drillers who Edwin L. Drake tried to enlist to his project to drill for oil in 1859. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the most efficient way to get the close by numbers?
On 20 Aug 2010, at 11:54pm, Peng Yu wrote: > select * from A as A1, A as A2 where A1.name=A2.name and > abs(A1.position - A2.position) <= 10 and A1.position != A2.position; If you're doing this a lot, work out which chunks of 20 each point is in. You only need to compare a point with points in two chunks of 20: the one below and the one above. So you can reject almost all the points immediately. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] What is the most efficient way to get the close by numbers?
If there is an index on (name, position) the a where like below might use it. A1.name=A2.name and A2.position between( A1.position - 10, A1.position + 10 ) On 8/20/2010 3:54 PM, Peng Yu wrote: > Hi, > > I have the following code to search for neighboring positions > (distance<=10). But it is slow for large data set. I'm wondering what > is the most efficient query for such a search. Note that I don't > create an index, as I'm not sure what index to create on table A. > > $ cat main.sql > #!/usr/bin/env bash > > rm -f main.db > sqlite3 main.db< > create table A (name text, position integer); > insert into A values('a', 1); > insert into A values('a', 5); > insert into A values('a', 21); > insert into A values('b', 3); > insert into A values('b', 15); > insert into A values('b', 19); > > .mode column > .headers on > .echo on > select * from A as A1, A as A2 where A1.name=A2.name and > abs(A1.position - A2.position)<= 10 and A1.position != A2.position; > > EOF > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] What is the most efficient way to get the close by numbers?
Hi, I have the following code to search for neighboring positions (distance <=10). But it is slow for large data set. I'm wondering what is the most efficient query for such a search. Note that I don't create an index, as I'm not sure what index to create on table A. $ cat main.sql #!/usr/bin/env bash rm -f main.db sqlite3 main.db