On 30.08.2014 01:37, Greg Ewing wrote: > M.-A. Lemburg wrote: >> we needed >> a way to make sure that Python 3 also optionally supports working >> with lone surrogates in such UTF-8 streams (nowadays called CESU-8: >> http://en.wikipedia.org/wiki/CESU-8). > > I don't think CESU-8 is the same thing. According to the wiki > page, CESU-8 *requires* all code points above 0xffff to be split > into surrogate pairs before encoding. It also doesn't say that > lone surrogates are valid -- it doesn't mention lone surrogates > at all, only pairs. Neither does the linked technical report. > > The technical report also says that CESU-8 forbids any UTF-8 > sequences of more than three bytes, so it's definitely not > "UTF-8 plus lone surrogates".
You're right, it's not the same as UTF-8 plus lone surrogates. CESU-8 does encode surrogates as individual code points using the UTF-8 encoding, which is what probably caused it to be mentioned in discussions when talking about having UTF-8 streams do the same for lone surrogates. So let's call the encoding UTF-8-py so that everyone knows what we're talking about :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 30 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-08-27: Released eGenix PyRun 2.0.1 ... http://egenix.com/go62 2014-09-19: PyCon UK 2014, Coventry, UK ... 20 days to go 2014-09-27: PyDDF Sprint 2014 ... 28 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com