Re: How does Python Unicode treat surrogates?

DougEwell2 Mon, 25 Jun 2001 22:05:56 -0700

In a message dated 2001-06-25 20:19:18 Pacific Daylight Time, [EMAIL PROTECTED] 
writes:

>  (For instance, I
>  don't see how it would be possible to encode a sequence of unicode
>  scalar values corresponding to a low and a high surrogate; if you
>  tried to map this back then you would get a single unicode scalar
>  value outside of the BMP).  Perhaps someone on the unicode list could
>  elaborate?

This is the source of my remaining confusion about definition D29.  It 
requires UTFs to round-trip all Unicode code points, and by extension all 
sequences of code points; yet if you use UTF-16 and start with the sequence 
<D800 DC00>, you don't end up with that -- you end up with <10000>.

The way it was explained to me on this list made it sound as though UTF-16 is 
the "master" UTF that other UTFs have to accommodate.  That didn't make sense 
to me, but I've been trying to cope with it.

Proposed UTFs that are based on UTF-16 code units, and are thus subject to 
the same D29 limitations as UTF-16, really annoy me, though.

-Doug Ewell
 Fullerton, California

Re: How does Python Unicode treat surrogates?

Reply via email to