I did some quick tests:
I took a 100,000 entry dictionary and made a single table
"create table dict (id INTEGER PRIMARY KEY, topics, def)"
I ran a "for loop" on the cursor object to extract all the topic column
data and put in in a Python list .
cur.execute('SELECT topics FROM dict')
start = time.time()
for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'
The time was around .035 sec.
With an index created,
'CREATE INDEX dctidx ON dict (id, topics)'
There was no difference in time. The file was around 2MB larger.
I made a second table with just the topic data,
"create table topics (topic)"
with this select,
'SELECT topic FROM topics'
The time was roughly half at around .017 sec.
I made a separate file with just the topic data table as above and the
access time was only slightly less .014-.015 sec.
Also, I populated the topics table completely and then populated the def
table, with no difference in speed if they were populated in the same loop.
So it seems the index is no help; a second topics table is a significant
help; and a separate file with topics table a negligible help; and
creation order is no help.
They were all good ideas to try, though.
Tim
On 3/20/2012 2:36 PM, Simon Slavin wrote:
On 20 Mar 2012, at 6:11pm, Tim Morton<[email protected]> wrote:
I may try this, but the topics are usually just one word each so I doubt an
index could reduce it much.
Max's trick is something that takes advantage of how SQLite works entirely. If
you do a SELECT like
SELECT id,topics FROM dictionary ORDER BY topics
and there's an index like
CREATE INDEX dti ON dictionary (topics,id)
then SQLite will use that index to find the right order to retrieve the records
in. But having figured out what rows it needs it realises that it has all the
information -- both the id and the topic -- right there in the index. So it
doesn't bother then looking up the TABLE data to fetch the other columns, it
just returns all the information you asked for right from the index it's
already looking at. It's fast and, because the index contains only the data
you want, it's equivalent to making up a separate table which has just the id
and topics columns.
I forgot about that trick when I replied but Max didn't.
Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users