Jeff H wrote:
hashlib.md5 does not appear to like unicode,
  UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in
position 1650: ordinal not in range(128)

After googling, I've found BDFL and others on Py3K talking about the
problems of hashing non-bytes (i.e. buffers)
http://www.mail-archive.com/[EMAIL PROTECTED]/msg09824.html

So what is the canonical way to hash unicode?
 * convert unicode to local
 * hash in current local
???
but what if local has ordinals outside of 128?

Is this just a problem for md5 hashes that I would not encounter using
a different method?  i.e. Should I just use the built-in hash function?
>
It can handle bytestrings, but if you give it unicode it performs a default encoding to ASCII, but that fails if there's a codepoint >= U+0080. Personally, I'd recommend encoding unicode to UTF-8.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to