Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Hrvoje Niksic Tue, 28 Apr 2009 06:06:46 -0700

Lino Mastrodomenico wrote:

Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character 
when
decoded with UTF-8, it should simply be considered an invalid UTF-8
sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
'\udcff').

"Should be considered" or "will be considered"? Python 3.0's UTF-8decoder happily accepts it and returns u'\udcff':


>>> b'\xed\xb3\xbf'.decode('utf-8')
'\udcff'

If the PEP depends on this being changed, it should be mentioned in the PEP.
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to