willie wrote:
> Marc 'BlackJack' Rintsch:
>
> >In <[EMAIL PROTECTED]>, willie wrote:
> >> # What's the correct way to get the
> >> # byte count of a unicode (UTF-8) string?
> >> # I couldn't find a builtin method
> >> # and the following is memory inefficient.
>
> >> ustr = "example\xC2\x9D".decode('UTF-8')
>
> >> num_chars = len(ustr) # 8
>
> >> buf = ustr.encode('UTF-8')
>
> >> num_bytes = len(buf) # 9
>
> >That is the correct way.
>
>
> # Apologies if I'm being dense, but it seems
> # unusual that I'd have to make a copy of a
> # unicode string, converting it into a byte
> # string, before I can determine the size (in bytes)
> # of the unicode string. Can someone provide the rational
> # for that or correct my misunderstanding?
>
You initially asked "What's the correct way to get the byte countof a
unicode (UTF-8) string".
It appears you meant "How can I find how many bytes there are in the
UTF-8 representation of a Unicode string without manifesting the UTF-8
representation?".
The answer is, "You can't", and the rationale would have to be that
nobody thought of a use case for counting the length of the UTF-8 form
but not creating the UTF-8 form. What is your use case?
Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list