I'm not an SQL guru by any means, so seeing this made a light go on. Does that mean it is a good idea in the general case to always add "limit 1" to a select that you know should only return 1 row? I'm assuming this works because the engine can short-cut out as soon as it finds that first matching row.
> -----Original Message----- > From: Dani Va [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 31, 2007 8:30 AM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] Performance problem for a simple select with range > > > First, thanks, your suggestion worked. > > To my surprise, it was enough to add "limit 1" to the original query. > > So: > > select * from blocks,locations where locations.locid = blocks.locid AND ? >= > blocks.startIpNum AND ? <= blocks.endIpNum limit 1 > takes about 1.398-005 seconds > > and > > select * from blocks,locations where locations.locid = blocks.locid AND ? >= > blocks.startIpNum AND ? <= blocks.endIpNum > takes about 3 seconds. > > > > > > Igor Tandetnik wrote: > > > > Dani Valevski <[EMAIL PROTECTED]> wrote: > >>> I think I have a performance problem for a simple select with range. > >>> > >>> My Tables: > >>> CREATE TABLE locations( > >>> locid INTEGER PRIMARY KEY, > >>> country TEXT, > >>> region TEXT, > >>> city TEXT, > >>> postalCode TEXT, > >>> latitude REAL, > >>> longitude REAL, > >>> dmaCode INTEGER, > >>> areaCode INTEGER) > >>> > >>> CREATE TABLE blocks( > >>> startIpNum INTEGER, > >>> endIpNum INTEGER, > >>> locId INTEGER) > >>> > >>> My Data: > >>> http://www.maxmind.com/app/geolitecity > >>> Blocks table has 2,776,436 rows > >>> Locations table has 159,488 rows > >>> > >>> After inserting the data I run analyze. > >>> > >>> My Query: > >>> select * from blocks,locations where locations.locid = blocks.locid > >>> AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum > >>> (replace ? with a number) > >>> > >>> Performance issues: > >>> I use python's sqlite3 module to run the query. > >>> With this configuration it takes about 0.6 seconds to complete the > >>> query. I > >>> think this is too slow. I could write a binary tree myself and have > >>> searches > >>> like this take, O(log(num_rows)) which is > >>> 7*something_which_shouldnt_take_too_much. Am I wrong? > > > > And what would you use as a key for this binary tree? I bet you would > > utilize additional information that the DB engine doesn't have - that > > your blocks don't overlap (they don't, right?) Try coming up with a > > search strategy without making this assumption. > > > > Try this: create an index on startIpNum, and run a query like this: > > > > select * from blocks, locations > > where blocks.startIpNum <= ? and blocks.locid = locations.locid > > order by blocks.startIpNum desc limit 1; > > > > This gives you the record with the largest value of startIpNum that is > > still smaller than the threshold, and should be very fast. It can > > produce a false positive - make the additional check for (? <= > > startIpEnd) in your application code. Don't put this check into the > > query though, or you will force it back into O(N) behavior in case your > > target value doesn't fall within any block after all. > > > > Igor Tandetnik > > > > > > ---------------------------------------------------------------------------- - > > To unsubscribe, send email to [EMAIL PROTECTED] > > ---------------------------------------------------------------------------- - > > > > > > > > -- > View this message in context: http://www.nabble.com/Performance-problem-for-a- > simple-select-with-range-tf4711654.html#a13509241 > Sent from the SQLite mailing list archive at Nabble.com. > > > ---------------------------------------------------------------------------- - > To unsubscribe, send email to [EMAIL PROTECTED] > ---------------------------------------------------------------------------- - ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------