On Wed, Jan 23, 2013 at 9:45 AM, Costello, Roger L. <[email protected]>wrote:
> Hi Folks, > > The book Unicode Demystified says this (page 190, first paragraph): > > The surrogate range is divided in half. > The range from U+D800 to U+DBFF contains > the "high surrogates," and the range from > U+DC00 to U+DFF contains the "low surrogates." > > Why are the low surrogates numerically larger than the high surrogates? > > That is, why isn't U+D800 to U+DBFF called the low surrogates and U+DC00 > to U+DFF called the high surrogates? > The high surrogates contain the high-order bits of the code point, and the low surrogates contain the low-order bits. (The last one is U+DFFF not U+DFF of course.) In the Unicode Technical Report #36, Unicode Security Considerations [1] it > says: > > PEP 383 takes this approach. It enables lossless > conversion to Unicode by converting all "unmappable" > sequences to a sequence of one or more isolated > high surrogate code points. That is, each unmappable > byte's value is a code point whose value is 0xDC00 > plus byte value. > > Notice "high surrogate" in that quote. I'm confused. I thought the low > surrogate range started at 0xDC00, but this document is saying that 0xDC00 > + byte value = high surrogate. Is that a typo in the document? > Yes, that looks wrong. I don't know which PEP 383 actually uses. Please submit a bug report via http://www.unicode.org/reporting.html markus

