Re: byte count unicode string

John Machin Wed, 20 Sep 2006 00:40:52 -0700

willie wrote:
> Marc 'BlackJack' Rintsch:
>
>  >In <[EMAIL PROTECTED]>, willie wrote:
>  >> # What's the correct way to get the
>  >> # byte count of a unicode (UTF-8) string?
>  >> # I couldn't find a builtin method
>  >> # and the following is memory inefficient.
>
>  >> ustr = "example\xC2\x9D".decode('UTF-8')
>
>  >> num_chars = len(ustr)    # 8
>
>  >> buf = ustr.encode('UTF-8')
>
>  >> num_bytes = len(buf)     # 9
>
>  >That is the correct way.
>
>
> # Apologies if I'm being dense, but it seems
> # unusual that I'd have to make a copy of a
> # unicode string, converting it into a byte
> # string, before I can determine the size (in bytes)
> # of the unicode string. Can someone provide the rational
> # for that or correct my misunderstanding?
>


You initially asked "What's the correct way to get the  byte countof a
unicode (UTF-8) string".

It appears you meant "How can I find how many bytes there are in the
UTF-8 representation of a Unicode string without manifesting the UTF-8
representation?".

The answer is, "You can't", and the rationale would have to be that
nobody thought of a use case for counting the length of the UTF-8  form
but not creating the UTF-8 form. What is your use case?

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: byte count unicode string

Reply via email to