Re: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]

Isaac Morland Fri, 29 Aug 2014 08:38:01 -0700

On Fri, 29 Aug 2014, M.-A. Lemburg wrote:

On 29.08.2014 02:41, Stephen J. Turnbull wrote:
Since Python allows working with lone surrogates in Unicode (they
are valid code points) and we're using UTF-8 for marshal, we needed
a way to make sure that Python 3 also optionally supports working
with lone surrogates in such UTF-8 streams (nowadays called CESU-8:
http://en.wikipedia.org/wiki/CESU-8).


If I want that wouldn't I specify "cesu-8" as the encoding?

i.e., instead of .decode ('utf-8') I would use .decode ('cesu-8'). Rightnow, trying this I get that cesu-8 is an unknown encoding but that couldbe changed without affecting the behaviour of the utf-8 codec.

It seems to me that .decode ('utf-8') should decode exactly and only validutf-8, including the non-use of surrogate pairs as an intermediateencoding step.


Isaac Morland                   CSCF Web Guru
DC 2554C, x36650                WWW Software Specialist
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]

Reply via email to