unicode, bytes redux

willie Sun, 24 Sep 2006 23:38:59 -0700

(beating a dead horse)

Is it too ridiculous to suggest that it'd be nice
if the unicode object were to remember the
encoding of the string it was decoded from?
So that it's feasible to calculate the number
of bytes that make up the unicode code points.


# U+270C
# 11100010 10011100 10001100
buf = "\xE2\x9C\x8C"

u = buf.decode('UTF-8')

# ... later ...

u.bytes() -> 3

(goes through each code point and calculates
the number of bytes that make up the character
according to the encoding)
-- 
http://mail.python.org/mailman/listinfo/python-list

unicode, bytes redux

Reply via email to