Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Martin v. Löwis Tue, 28 Apr 2009 09:49:46 -0700

> It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is
> not a valid Unicode character (not a character at all, really) and the
> only way you can put this in a POSIX filename is if you use a very
> lenient  UTF-8 encoder that gives you b'\xed\xb3\xbf'.
> 
> Since this byte sequence doesn't represent a valid character when
> decoded with UTF-8, it should simply be considered an invalid UTF-8
> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
> '\udcff').
> 
> Martin: maybe the PEP should say this explicitly?


Sure, will do.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to