Am 14.11.2012 01:50, schrieb Richard: > These URL ID's would just be used internally for quick lookups, not exposed > publicly in a web application. > > Ideally I would want to avoid collisions altogether. But if that means > significant extra CPU time then 1 collision in 10 million hashes would be > tolerable.
Are you storing the URLs in any kind of database like a SQL database? A proper index on the data column will avoid full table scans. It will give you almost O(1) complexity on lookups and O(n) worst case complexity for collisions. -- http://mail.python.org/mailman/listinfo/python-list