I'm using Python 3.3 and the sqlite3 module in the standard library.
I'm processing a lot of strings from input files (among other things,
values of headers in e-mail & news messages) and suppressing
duplicates using a table of seen strings in the database.

It seems to me --- from past experience with other things, where
testing integers for equality is faster than testing strings, as well
as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
--- that the SELECT tests should be faster if I am looking up an
INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that

If so, what sort of hashing function should I use?  The "maxint" for
SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
thing I've thought of so far is to use MD5 or SHA-something modulo the
maxint value.  (Security isn't an issue --- i.e., I'm not worried
about someone trying to create a hash collision.)


"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."                                       (McMullen 2001)

Reply via email to