Right, Jean-Christophe, about "moving too much data around". You'd want each of the inner selects to produce a vector of integer ids. "select *" is clearly not going to be very efficient. 200K rows is a very small table for this sort of test, in my view. Were the faster intereections any faster than:
select * from T where lo ... {low condition} and hi ......{ hi condition} and name = 'x' where only one index is chosen? The answer would depend on the cardinality. If [name] is highly differentiated then name='x' would return very few rows and the query is fast using standard approach. But if [name] were, say, FirstName, and there were hundreds of thousands of records containing "Joe" in the OP's table, then the name index won't help us very much. That's why I am asking about Minimal Perfect Hash and intersection of vectors. The Minimal-Perfect-Hash-INTERSECTION-OF-VECTORS approach might benefit queries against tables having several million rows. What I'm wondering (and lack the C skills to find out for myself) is whether SQLite's underlying algorithms for INTERSECT could be optimized with a minimal perfect hash approach. The algorithms would decide which vector of ids is the better candidate to be used for the MPH and which is the better candidate to be iterated, and then iterate over the one vector of ids, testing for the existence of each id in the MPH using the optimized Exists() function supplied by the mph library for the particular type of mph being used. The question, in scenarios where the vectors contain many items, is whether the overhead of creating the MPH and testing for existence is significantly less than the overhead of doing whatever INTERSECT is doing now when it intersects vectors of ids. You have a ready-made acronym to advertise the speed if it turns out to be faster: MPH. ;-) Regards Tim Romano Swarthmore PA On Wed, May 12, 2010 at 8:52 PM, Jean-Christophe Deschamps <j...@q-e-d.org>wrote: > > > > >I would first create an INTEGER primary key and then place an index on > >name, > >another on i_from, and another on i_to, and then see if the approach below > >has any benefit. > > > >When I tried this with a geo-queryit was actually slower than the standard > >select, and I'm curious if that's always going to be the case. It will > >come > >down to how efficient the INTERSECT of the vectors of integers is. Each > >vector will have been the result of an index-scan. If INTERSECT were > >optimized (perhaps with a minimal perfect hash function > >http://cmph.sourceforge.net/index.html) this approach might be useful. > > > All three following queries use only simple indexes (PK, name, lo, hi). > > Query#1: > select * from tst where lo < 345678 > intersect > select * from tst where hi > 123456 > intersect > select * from tst where name = 'aaelj'; > > Query#2 > select * from tst > join ( > select rowid from tst where lo < 345678 > intersect > select rowid from tst where hi > 123456 > ) as interval > on tst.rowid = interval.rowid and name = 'aaelj'; > > Query#3 > select * from tst > join ( > select rowid from tst where lo < 345678 > intersect > select rowid from tst where hi > 123456 > intersect > select rowid from tst where name = 'aaelj' > ) as interval > on tst.rowid = interval.rowid; > > On a 200K-row test table with random data, queries #2 and #3 were > essentially identical while #1 was twice slower (moving too much data > around, uselessly). > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users