Dani Valevski <[EMAIL PROTECTED]> wrote:
I think I have a performance problem for a simple select with range.
My Tables:
CREATE TABLE locations(
locid INTEGER PRIMARY KEY,
country TEXT,
region TEXT,
city TEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)
CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)
My Data:
http://www.maxmind.com/app/geolitecity
Blocks table has 2,776,436 rows
Locations table has 159,488 rows
After inserting the data I run analyze.
My Query:
select * from blocks,locations where locations.locid = blocks.locid
AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum
(replace ? with a number)
Performance issues:
I use python's sqlite3 module to run the query.
With this configuration it takes about 0.6 seconds to complete the
query. I
think this is too slow. I could write a binary tree myself and have
searches
like this take, O(log(num_rows)) which is
7*something_which_shouldnt_take_too_much. Am I wrong?
And what would you use as a key for this binary tree? I bet you would
utilize additional information that the DB engine doesn't have - that
your blocks don't overlap (they don't, right?) Try coming up with a
search strategy without making this assumption.
Try this: create an index on startIpNum, and run a query like this:
select * from blocks, locations
where blocks.startIpNum <= ? and blocks.locid = locations.locid
order by blocks.startIpNum desc limit 1;
This gives you the record with the largest value of startIpNum that is
still smaller than the threshold, and should be very fast. It can
produce a false positive - make the additional check for (? <=
startIpEnd) in your application code. Don't put this check into the
query though, or you will force it back into O(N) behavior in case your
target value doesn't fall within any block after all.
Igor Tandetnik
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------