Jeff H wrote:
[...] So once I have character strings transformed
internally to unicode objects, I should encode them in 'utf-8' before
attempting to do things that guess at the proper way to encode them
for further processing.(i.e. hashlib)

It looks like hashlib in Python 3 will not even attempt to digest a unicode object. Trying to hash 'abcdefg' in in Python 3.0rc3 I get:

  TypeError: object supporting the buffer API required

I think that's good behavior, except that the error message is likely to send beginners to look up the obscure buffer interface before they find they just need mystring.decode('utf8') or bytes(mystring, 'utf8').

a='André'
b=unicode(a,'cp1252')
b
u'Andr\xc3\xa9'
hashlib.md5(b.encode('utf-8')).hexdigest()
'b4e5418a36bc4badfc47deb657a2b50c'

Incidentally, MD5 has fallen and SHA-1 is falling. Python's hashlib also includes the stronger SHA-2 family.


--
--Bryan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to