I did some quick tests:

I took a 100,000 entry dictionary and made a single table

"create table dict (id INTEGER PRIMARY KEY, topics, def)"

I ran a "for loop" on the cursor object to extract all the topic column data and put in in a Python list .

cur.execute('SELECT topics FROM dict')
start = time.time()
for item in cur:
    tmp.append(item[0])
print time.time() - start, 'get WL'

The time was around .035 sec.

With an index created,

 'CREATE INDEX dctidx ON dict (id, topics)'

There was no difference in time. The file was around 2MB larger.

I made a second table with just the topic data,

"create table topics (topic)"

with this select,

'SELECT topic FROM topics'

The time was roughly half at around .017 sec.

I made a separate file with just the topic data table as above and the access time was only slightly less .014-.015 sec.

Also, I populated the topics table completely and then populated the def table, with no difference in speed if they were populated in the same loop.

So it seems the index is no help; a second topics table is a significant help; and a separate file with topics table a negligible help; and creation order is no help.

They were all good ideas to try, though.

Tim




On 3/20/2012 2:36 PM, Simon Slavin wrote:
On 20 Mar 2012, at 6:11pm, Tim Morton<[email protected]>  wrote:

I may try this, but the topics are usually just one word each so I doubt an 
index could reduce it much.
Max's trick is something that takes advantage of how SQLite works entirely.  If 
you do a SELECT like

SELECT id,topics FROM dictionary ORDER BY topics

and there's an index like

CREATE INDEX dti ON dictionary (topics,id)

then SQLite will use that index to find the right order to retrieve the records 
in.  But having figured out what rows it needs it realises that it has all the 
information -- both the id and the topic -- right there in the index.  So it 
doesn't bother then looking up the TABLE data to fetch the other columns, it 
just returns all the information you asked for right from the index it's 
already looking at.  It's fast and, because the index contains only the data 
you want, it's equivalent to making up a separate table which has just the id 
and topics columns.

I forgot about that trick when I replied but Max didn't.

Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to