Re: Utility to report and repair broken surrogate pairs in UTF-16 text

Doug Ewell Fri, 05 Nov 2010 16:34:50 -0700

Markus Scherer wrote:

Right, but as I said, those downstream tasks shouldn't be consumersof UTF-16 code units anyway. They should be consumers of Unicodecode points, which by definition excludes loose surrogates.
Code points include surrogates. Maybe you mean "UTF-32 code units" or"Unicode scalar values".


You're right, I meant Unicode scalar values.

I don't see the difference between allowing loose UTF-16 code units inwhat purports to be a character stream and allowing loose UTF-8 codeunits.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

Reply via email to