On 2014-05-22, Peter Otten wrote:

> Adam Funk wrote:

>> Well, J*v* returns a byte array, so I used to do this:
>> 
>>     digester = MessageDigest.getInstance("MD5");
>>     ...
>>     digester.reset();
>>     byte[] digest = digester.digest(bytes);
>>     return new BigInteger(+1, digest);
>
> In Python 3 there's int.from_bytes()
>
>>>> h = hashlib.sha1(b"Hello world")
>>>> int.from_bytes(h.digest(), "little")
> 538059071683667711846616050503420899184350089339

Excellent, thanks for pointing that out.  I've just recently started
using Python 3 instead of 2, & appreciate pointers to new things like
that.  The only thing that really bugs me in Python 3 is that execfile
has been removed (I find it useful for testing things interactively).


>> I dunno why language designers don't make it easy to get a single big
>> number directly out of these things.
>  
> You hardly ever need to manipulate the numerical value of the digest. And on 
> its way into the database it will be re-serialized anyway.

I now agree that my original plan to hash strings for the SQLite3
table was pointless, so I've changed the subject header.  :-)

I have had good reason to use int hashes in the past, however.  I was
doing some experiments with Andrei Broder's "sketches of shingles"
technique for finding partial duplication between documents, & you
need integers for that so you can do modulo arithmetic.

I've also used hashes of strings for other things involving
deduplication or fast lookups (because integer equality is faster than
string equality).  I guess if it's just for deduplication, though, a
set of byte arrays is as good as a set of int?


-- 
Classical Greek lent itself to the promulgation of a rich culture,
indeed, to Western civilization.  Computer languages bring us
doorbells that chime with thirty-two tunes, alt.sex.bestiality, and
Tetris clones.                                         (Stoll 1995)
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to