The id2name.txt file is an index of primary keys to strings. They look like
this:
11293102971459182412:Descriptive unique name for this record\n
950918240981208142:Another name for another record\n
The file's properties are:
# wc -l id2name.txt
8191180 id2name.txt
# du -h id2name.txt
517M id2name.txt
I'm loading the file into memory with code like this:
id2name = {}
for line in iter(open('id2name.txt').readline,''):
id,name = line.strip().split(':')
id = long(id)
id2name[id] = name
This takes about 45 *minutes*
If I comment out the last line in the loop body it takes only about 30
_seconds_ to run.
This would seem to implicate the line id2name[id] = name as being
excruciatingly slow.
Is there a fast, functionally equivalent way of doing this?
(Yes, I really do need this cached. No, an RDBMS or disk-based hash is not
fast enough.)
--
http://mail.python.org/mailman/listinfo/python-list