Guido van Rossum writes:

 > I don't think anyone else has that impression. Please cite chapter and
 > verse if you really think this is important. IIUC, UCS-2 does not
 > allow surrogate pairs,

In the original definition of UCS-2 in draft ISO 10646 (1990),
everything in the BMP except for 0xFFFF and 0xFFFE was a character,
and there was no concept of "surrogate" at all.  Later in ISO 10646
(1993)[1], the Surrogate Area was carved out of the Private Area, but
UCS-2 implementations simply treat them as (single) characters with
special properties.  This was more or less backward compatible as all
corporate uses of the private area used the lower code points and
didn't conflict with the surrogates.  Finally (in 2000 or 2003) the
definition of UCS-2 in ISO 10646 was revised in a backward-
incompatible way to exclude surrogates entirely, ie, nowadays it is a
range-restricted version of UTF-16.

Footnotes: 
[1]  IIRC, strictly speaking this was done slightly later (1993 or
1994) in an official Amendment to ISO 10646; the Amendment was
incorporated into the standard in 2000.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to