Re: Why are the low surrogates numerically larger than the high surrogates?

Markus Scherer Wed, 23 Jan 2013 10:30:31 -0800

On Wed, Jan 23, 2013 at 9:45 AM, Costello, Roger L. <[email protected]>wrote:


> Hi Folks,
>
> The book Unicode Demystified says this (page 190, first paragraph):
>
>     The surrogate range is divided in half.
>     The range from U+D800 to U+DBFF contains
>     the "high surrogates," and the range from
>     U+DC00 to U+DFF contains the "low surrogates."
>
> Why are the low surrogates numerically larger than the high surrogates?
>
> That is, why isn't U+D800 to U+DBFF called the low surrogates and U+DC00
> to U+DFF called the high surrogates?
>

The high surrogates contain the high-order bits of the code point, and the
low surrogates contain the low-order bits.
(The last one is U+DFFF not U+DFF of course.)

In the Unicode Technical Report #36, Unicode Security Considerations [1] it
> says:
>
>     PEP 383 takes this approach. It enables lossless
>     conversion to Unicode by converting all "unmappable"
>     sequences to a sequence of one or more isolated
>     high surrogate code points. That is, each unmappable
>     byte's value is a code point whose value is 0xDC00
>     plus byte value.
>
> Notice "high surrogate" in that quote. I'm confused. I thought the low
> surrogate range started at 0xDC00, but this document is saying that  0xDC00
> + byte value = high surrogate.  Is that a typo in the document?
>

Yes, that looks wrong. I don't know which PEP 383 actually uses.
Please submit a bug report via http://www.unicode.org/reporting.html

markus

Re: Why are the low surrogates numerically larger than the high surrogates?

Reply via email to