On 2008-07-03 19:35, Jeroen Ruigrok van der Werven wrote:
-On [20080703 19:21], Adam Olsen ([EMAIL PROTECTED]) wrote:
On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
Please remember that lone surrogate pair code points are perfectly
valid Unicode code points, nevertheless. Just as a lone combining
code point is valid on its own.
That is a big part of these problems.  For all practical purposes, a
surrogate is like a UTF-8 code unit, and must be handled the same way,
so why the heck do they confuse everybody by saying "oh, it's a code
point too!"?

Because surrogate code points are not Unicode scalar values, isolated UTF-16
code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode
5.0/5.1, section 3.9)

True. They are not valid UTF-16 code units, but a code unit is
just a storage byte representation of a Unicode tranformation...

"""
Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.)
"""

That's not the same thing as a code point which is an assignment
of a slot in the Unicode character set...

"""
Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding.)
"""

Reference: http://www.unicode.org/glossary/

Also see Chapter 3.4 
(http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G2212):

"""
Surrogate code points and noncharacters are considered assigned code points,
but not assigned characters.
"""

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 03 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania             3 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to