Hi Folks,

The book Unicode Demystified says this (page 190, first paragraph):

    The surrogate range is divided in half.
    The range from U+D800 to U+DBFF contains
    the "high surrogates," and the range from
    U+DC00 to U+DFF contains the "low surrogates."

Why are the low surrogates numerically larger than the high surrogates?

That is, why isn't U+D800 to U+DBFF called the low surrogates and U+DC00 to 
U+DFF called the high surrogates?

In the Unicode Technical Report #36, Unicode Security Considerations [1] it 
says:

    PEP 383 takes this approach. It enables lossless 
    conversion to Unicode by converting all "unmappable" 
    sequences to a sequence of one or more isolated 
    high surrogate code points. That is, each unmappable 
    byte's value is a code point whose value is 0xDC00 
    plus byte value.

Notice "high surrogate" in that quote. I'm confused. I thought the low 
surrogate range started at 0xDC00, but this document is saying that  0xDC00 + 
byte value = high surrogate.  Is that a typo in the document?

/Roger   

[1] http://www.unicode.org/reports/tr36/#TOC-PEP-383-Approach


Reply via email to