Re: Newbie question about text encoding

Chris Angelico Sat, 07 Mar 2015 08:43:46 -0800

On Sun, Mar 8, 2015 at 3:25 AM, Marko Rauhamaa <[email protected]> wrote:
> Chris Angelico <[email protected]>:
>
>> On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <[email protected]> wrote:
>>> Steven D'Aprano <[email protected]>:
>>>
>>>> Marko Rauhamaa wrote:
>>>>
>>>>> That said, UTF-8 does suffer badly from its not being
>>>>> a bijective mapping.
>>>>
>>>> Can you explain?
>>>
>>> In Python terms, there are bytes objects b that don't satisfy:
>>>
>>>    b.decode('utf-8').encode('utf-8') == b
>>
>> Please provide an example; that sounds like a bug. If there is any
>> invalid UTF-8 stream which decodes without an error, it is actually a
>> security bug, and should be fixed pronto in all affected and supported
>> versions.
>
> Here's an example:
>
>    b = b'\x80'
>
> Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping
> from str objects to bytes objects.


That's not the same as what you said. All you've proven is that there
are bit patterns which are not UTF-8 streams... which is a very
deliberate feature. How does UTF-8 *suffer* from this? It benefits
hugely!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Newbie question about text encoding

Reply via email to