On 2014-05-22, Peter Otten wrote: > Adam Funk wrote:
>> Well, J*v* returns a byte array, so I used to do this: >> >> digester = MessageDigest.getInstance("MD5"); >> ... >> digester.reset(); >> byte[] digest = digester.digest(bytes); >> return new BigInteger(+1, digest); > > In Python 3 there's int.from_bytes() > >>>> h = hashlib.sha1(b"Hello world") >>>> int.from_bytes(h.digest(), "little") > 538059071683667711846616050503420899184350089339 Excellent, thanks for pointing that out. I've just recently started using Python 3 instead of 2, & appreciate pointers to new things like that. The only thing that really bugs me in Python 3 is that execfile has been removed (I find it useful for testing things interactively). >> I dunno why language designers don't make it easy to get a single big >> number directly out of these things. > > You hardly ever need to manipulate the numerical value of the digest. And on > its way into the database it will be re-serialized anyway. I now agree that my original plan to hash strings for the SQLite3 table was pointless, so I've changed the subject header. :-) I have had good reason to use int hashes in the past, however. I was doing some experiments with Andrei Broder's "sketches of shingles" technique for finding partial duplication between documents, & you need integers for that so you can do modulo arithmetic. I've also used hashes of strings for other things involving deduplication or fast lookups (because integer equality is faster than string equality). I guess if it's just for deduplication, though, a set of byte arrays is as good as a set of int? -- Classical Greek lent itself to the promulgation of a rich culture, indeed, to Western civilization. Computer languages bring us doorbells that chime with thirty-two tunes, alt.sex.bestiality, and Tetris clones. (Stoll 1995) -- https://mail.python.org/mailman/listinfo/python-list