Re: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Dani Va


Igor Tandetnik wrote:
> 
> Try searching for a value that doesn't fall into any block - you'll 
> likely find that the query takes a noticeable time to produce zero 
> records. Pick a large value that's greater than all startIpNum's.
> 

Yes, you are right. That's why I'm going with the original query you
suggested. 

Thanks again
Dani



-- 
View this message in context: 
http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13517991
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Doug
I'm not an SQL guru by any means, so seeing this made a light go on.  Does
that mean it is a good idea in the general case to always add "limit 1" to a
select that you know should only return 1 row?  I'm assuming this works
because the engine can short-cut out as soon as it finds that first matching
row.

> -Original Message-
> From: Dani Va [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 31, 2007 8:30 AM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Performance problem for a simple select with range
> 
> 
> First, thanks, your suggestion worked.
> 
> To my surprise, it was enough to add "limit 1" to the original query.
> 
> So:
> 
> select * from blocks,locations where locations.locid = blocks.locid AND ?
>=
> blocks.startIpNum AND ? <= blocks.endIpNum limit 1
> takes about 1.398-005 seconds
> 
> and
> 
> select * from blocks,locations where locations.locid = blocks.locid AND ?
>=
> blocks.startIpNum AND ? <= blocks.endIpNum
> takes about 3 seconds.
> 
> 
> 
> 
> 
> Igor Tandetnik wrote:
> >
> > Dani Valevski <[EMAIL PROTECTED]> wrote:
> >>> I think I have a performance problem for a simple select with range.
> >>>
> >>> My Tables:
> >>> CREATE TABLE locations(
> >>>locidINTEGER PRIMARY KEY,
> >>>country TEXT,
> >>>regionTEXT,
> >>>cityTEXT,
> >>>postalCode TEXT,
> >>>latitude REAL,
> >>>longitude REAL,
> >>>dmaCode INTEGER,
> >>>areaCode INTEGER)
> >>>
> >>> CREATE TABLE blocks(
> >>>startIpNum INTEGER,
> >>>endIpNum INTEGER,
> >>>locId INTEGER)
> >>>
> >>> My Data:
> >>> http://www.maxmind.com/app/geolitecity
> >>> Blocks table has 2,776,436 rows
> >>> Locations table has 159,488 rows
> >>>
> >>> After inserting the data I run analyze.
> >>>
> >>> My Query:
> >>> select * from blocks,locations where locations.locid = blocks.locid
> >>> AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum
> >>> (replace ? with a number)
> >>>
> >>> Performance issues:
> >>> I use python's sqlite3 module to run the query.
> >>> With this configuration it takes about 0.6 seconds to complete the
> >>> query. I
> >>> think this is too slow. I could write a binary tree myself and have
> >>> searches
> >>> like this take, O(log(num_rows)) which is
> >>> 7*something_which_shouldnt_take_too_much. Am I wrong?
> >
> > And what would you use as a key for this binary tree? I bet you would
> > utilize additional information that the DB engine doesn't have - that
> > your blocks don't overlap (they don't, right?) Try coming up with a
> > search strategy without making this assumption.
> >
> > Try this: create an index on startIpNum, and run a query like this:
> >
> > select * from blocks, locations
> > where blocks.startIpNum <= ? and blocks.locid = locations.locid
> > order by blocks.startIpNum desc limit 1;
> >
> > This gives you the record with the largest value of startIpNum that is
> > still smaller than the threshold, and should be very fast. It can
> > produce a false positive - make the additional check for (? <=
> > startIpEnd) in your application code. Don't put this check into the
> > query though, or you will force it back into O(N) behavior in case your
> > target value doesn't fall within any block after all.
> >
> > Igor Tandetnik
> >
> >
> >

-
> > To unsubscribe, send email to [EMAIL PROTECTED]
> >

-
> >
> >
> >
> 
> --
> View this message in context:
http://www.nabble.com/Performance-problem-for-a-
> simple-select-with-range-tf4711654.html#a13509241
> Sent from the SQLite mailing list archive at Nabble.com.
> 
> 
>

-
> To unsubscribe, send email to [EMAIL PROTECTED]
>

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Dani Va

First, thanks, your suggestion worked. 

To my surprise, it was enough to add "limit 1" to the original query.

So:

select * from blocks,locations where locations.locid = blocks.locid AND ? >=
blocks.startIpNum AND ? <= blocks.endIpNum limit 1
takes about 1.398-005 seconds 

and 

select * from blocks,locations where locations.locid = blocks.locid AND ? >=
blocks.startIpNum AND ? <= blocks.endIpNum 
takes about 3 seconds.





Igor Tandetnik wrote:
> 
> Dani Valevski <[EMAIL PROTECTED]> wrote:
>>> I think I have a performance problem for a simple select with range.
>>>
>>> My Tables:
>>> CREATE TABLE locations(
>>>locidINTEGER PRIMARY KEY,
>>>country TEXT,
>>>regionTEXT,
>>>cityTEXT,
>>>postalCode TEXT,
>>>latitude REAL,
>>>longitude REAL,
>>>dmaCode INTEGER,
>>>areaCode INTEGER)
>>>
>>> CREATE TABLE blocks(
>>>startIpNum INTEGER,
>>>endIpNum INTEGER,
>>>locId INTEGER)
>>>
>>> My Data:
>>> http://www.maxmind.com/app/geolitecity
>>> Blocks table has 2,776,436 rows
>>> Locations table has 159,488 rows
>>>
>>> After inserting the data I run analyze.
>>>
>>> My Query:
>>> select * from blocks,locations where locations.locid = blocks.locid
>>> AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum
>>> (replace ? with a number)
>>>
>>> Performance issues:
>>> I use python's sqlite3 module to run the query.
>>> With this configuration it takes about 0.6 seconds to complete the
>>> query. I
>>> think this is too slow. I could write a binary tree myself and have
>>> searches
>>> like this take, O(log(num_rows)) which is
>>> 7*something_which_shouldnt_take_too_much. Am I wrong?
> 
> And what would you use as a key for this binary tree? I bet you would 
> utilize additional information that the DB engine doesn't have - that 
> your blocks don't overlap (they don't, right?) Try coming up with a 
> search strategy without making this assumption.
> 
> Try this: create an index on startIpNum, and run a query like this:
> 
> select * from blocks, locations
> where blocks.startIpNum <= ? and blocks.locid = locations.locid
> order by blocks.startIpNum desc limit 1;
> 
> This gives you the record with the largest value of startIpNum that is 
> still smaller than the threshold, and should be very fast. It can 
> produce a false positive - make the additional check for (? <= 
> startIpEnd) in your application code. Don't put this check into the 
> query though, or you will force it back into O(N) behavior in case your 
> target value doesn't fall within any block after all.
> 
> Igor Tandetnik 
> 
> 
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13509241
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-29 Thread drh
"Dani Valevski" <[EMAIL PROTECTED]> wrote:
> I think I have a performance problem for a simple select with range.
> 
> My Tables:
> CREATE TABLE locations(locidINTEGER PRIMARY KEY, ...);
> 
> CREATE TABLE blocks(
> startIpNum INTEGER,
> endIpNum INTEGER,
> locId INTEGER)
> 
> My Data:
> Blocks table has 2,776,436 rows
> Locations table has 159,488 rows
> 
> My Query:
> select * from blocks,locations where locations.locid = blocks.locid AND ? >=
> blocks.startIpNum AND ? <= blocks.endIpNum
> (replace ? with a number)
> 

To do searches of this kind with maximum efficiency, you normally
want to use a 1-dimensional R-Tree index.  SQLite does not support
RTree indices natively, though it is conceivable that you could
write a RTree virtual table extension for SQLite.

Without an RTree index, and unless you can exploit the distribution
of data in the blocks table, you really cannot do much better than a
full table scan on blocks with an indexed lookup of locations for
each matching block.  That is probably what is happening on your
original query before you added indices.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-29 Thread Kees Nuyt
[Default] On Mon, 29 Oct 2007 15:25:18 +0200, "Dani Valevski"
<[EMAIL PROTECTED]> wrote:

>I think I have a performance problem for a simple select with range.
>
>My Tables:
>CREATE TABLE locations(
>locidINTEGER PRIMARY KEY,
>country TEXT,
>regionTEXT,
>cityTEXT,
>postalCode TEXT,
>latitude REAL,
>longitude REAL,
>dmaCode INTEGER,
>areaCode INTEGER)
>
>CREATE TABLE blocks(
>startIpNum INTEGER,
>endIpNum INTEGER,
>locId INTEGER)
>
>My Data:
>http://www.maxmind.com/app/geolitecity
>Blocks table has 2,776,436 rows
>Locations table has 159,488 rows
>
>After inserting the data I run analyze.
>
>My Query:
>select * from blocks,locations where locations.locid = blocks.locid AND ? >=
>blocks.startIpNum AND ? <= blocks.endIpNum
>(replace ? with a number)
>
>Disclaimer:
>I'm a bit new to databases.
>
>Performance issues:
>I use python's sqlite3 module to run the query.
>With this configuration it takes about 0.6 seconds to complete the query. I
>think this is too slow. I could write a binary tree myself and have searches
>like this take, O(log(num_rows)) which is
>7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer)
>
>Anyway, I thought the problem was that startIpNum, endIpNum are not indexed.
>So I added indices for them (even tried indexing them both). This only makes
>the query take about 3 seconds.
>Ideas anyone?
>
>Source:
>is attached.
>
>Thank you for your help

Just some suggestions:
Index locid in both tables, and rewrite

> select *
>   from blocks,locations 
>  where locations.locid = blocks.locid
>  AND ? >= blocks.startIpNum 
>  AND ? <= blocks.endIpNum

to:

 select *
   from blocks
  INNER JOIN locations USING (locid)
  where ? >= blocks.startIpNum 
AND ? <= blocks.endIpNum

or:

 select *
   from locations
  INNER JOIN blocks USING (locid)
  where ? >= blocks.startIpNum 
AND ? <= blocks.endIpNum

HTH
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Performance problem for a simple select with range

2007-10-29 Thread Dani Valevski
I think I have a performance problem for a simple select with range.

My Tables:
CREATE TABLE locations(
locidINTEGER PRIMARY KEY,
country TEXT,
regionTEXT,
cityTEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)

CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)

My Data:
http://www.maxmind.com/app/geolitecity
Blocks table has 2,776,436 rows
Locations table has 159,488 rows

After inserting the data I run analyze.

My Query:
select * from blocks,locations where locations.locid = blocks.locid AND ? >=
blocks.startIpNum AND ? <= blocks.endIpNum
(replace ? with a number)

Disclaimer:
I'm a bit new to databases.

Performance issues:
I use python's sqlite3 module to run the query.
With this configuration it takes about 0.6 seconds to complete the query. I
think this is too slow. I could write a binary tree myself and have searches
like this take, O(log(num_rows)) which is
7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer)

Anyway, I thought the problem was that startIpNum, endIpNum are not indexed.
So I added indices for them (even tried indexing them both). This only makes
the query take about 3 seconds.
Ideas anyone?

Source:
is attached.

Thank you for your help





-- 
Dani
http://daniva.googlepages.com


-- 
Dani
http://daniva.googlepages.com
'''geolite
GeoLite City is a free IP to city database provided by MaxMind. 
They provide a C API (and a python wrapper) for the database.
If you can't compile the C sources on your server (or get a binary 
version), this script might be helpful for you. 
The script puts the geoip data in a sqllite database, and provides
interfaces for updating and searching the database.

To use this script, get the database in CSV format:
http://www.maxmind.com/app/geolitecity

You also need to have python 2.5 for this script (sqlite3 is used)
'''

import sqlite3 as sqlite
import os

def dottedQuadToNum(ip):
"convert decimal dotted quad string to long integer"

hexn = ''.join(["%02X" % long(i) for i in ip.split('.')])
return long(hexn, 16)


def cursorToDict(cursor):
val = cursor.next()
return dict([(cursor.description[i][0],val[i]) for i in xrange(len(cursor.description))])

def test():
import sqlite3
from time import clock
x = sqlite3.connect('geolite.db')
y = x.cursor()
ip = dottedQuadToNum("84.108.189.94")
res = y.execute('select * from blocks,locations where locations.locid = blocks.locid AND ? >= blocks.startIpNum AND ? <= blocks.endIpNum', [ip,ip])
begin = clock()
f = res.next() 
end = clock()
y.close()
x.close()
return end-begin, f

def test2():
from time import clock
x = GeoLiteDB()
x.connect();
begin = clock()
x.ipLocation("84.108.189.94");
end = clock()
x.close()
return end - begin


def createDB(dbPath = 'geolite.db', locationsPath='GeoLiteCity-Location.csv', blocksPath='GeoLiteCity-Blocks.csv', warnOnDelete = True):
if os.path.exists(dbPath):
if warnOnDelete:
	print "file %s will be deleted. Press any key to continue, or 'n' to abort..." % (os.path.abspath(dbPath))
	if getch() == 'n':
	print 'aborted.'
	return None
	os.remove(os.path.abspath(dbPath))	
conn = sqlite.connect(dbPath)
cursor = conn.cursor()
try:
cursor.execute('''CREATE TABLE locations(
locid	INTEGER PRIMARY KEY,
country TEXT,
region	TEXT,
city	TEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)''')

	cursor.execute('''CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)''')

	locations = file(locationsPath,'r')
	print ('parsing locations. This will a while.')
	print locations.readline().strip() #should print copyright note
print locations.readline().strip() #should print column names
lines = ([x.strip('"') for x in line.strip().split(',')] for line in locations.xreadlines())
cursor.executemany('insert into locations values (?,?,?,?,?,?,?,?,?)', lines)
	locations.close()

	blocks = file(blocksPath,'r')
	print ('parsing blocks. This will take longer.')
	print blocks.readline().strip() #should print copyright note
print blocks.readline().strip() #should print column names
lines = ([x.strip('"') for x in line.strip().split(',')] for line in blocks.xreadlines())
	cursor.executemany('insert into blocks values (?,?,?)', lines)	
	blocks.close()

#cursor.execute('''CREATE UNIQUE INDEX startIpNumIx ON blocks(startIpNum);''')
#	cursor.execute('''CREATE UNIQUE INDEX endIpNumIx ON blocks(endIpNum);''')

conn.commit()

	print 'analyze'
	cursor.execute('''ANALYZE;''')

numBlocks = cursor.execute('select count(*)