byte count unicode string

willie Wed, 20 Sep 2006 00:23:37 -0700

Marc 'BlackJack' Rintsch:

 >In <[EMAIL PROTECTED]>, willie wrote:
 >> # What's the correct way to get the
 >> # byte count of a unicode (UTF-8) string?
 >> # I couldn't find a builtin method
 >> # and the following is memory inefficient.


 >> ustr = "example\xC2\x9D".decode('UTF-8')

 >> num_chars = len(ustr)    # 8

 >> buf = ustr.encode('UTF-8')

 >> num_bytes = len(buf)     # 9

 >That is the correct way.


# Apologies if I'm being dense, but it seems
# unusual that I'd have to make a copy of a
# unicode string, converting it into a byte
# string, before I can determine the size (in bytes)
# of the unicode string. Can someone provide the rational
# for that or correct my misunderstanding?

# Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list

byte count unicode string

Reply via email to