Re: [Python-3000] String comparison

Martin v. Löwis Wed, 13 Jun 2007 13:37:56 -0700

>> Until one or more of the senior developers says otherwise, I'm going
>> to assume that.
> 
> Yeah, what's the difference between code units and points?


A code unit is the atomic base in some encoding. It is a single byte
in most encodings, but a 16-bit quantity in UTF-16 (and a 32-bit
quantity in UTF-32).

A code point is something that has a 1:1 relationship with a logical
character (in particular, a Unicode character).

In UCS-2, a code point can be represented in 16 bits, and you can
represent all BMP characters. The low and high surrogates don't
encode characters and are reserved.

In UCS-4, you need more than 16 bits to represent a code point.
For example, you might use UTF-16, where you can use a single
code unit for all BMP characters, and two of them for code points
above U+FFFF.

Ever since PEP 261, Python admits that the elements of a Unicode
string are code units, and that you might need more than one of
them (specifically, for non-BMP characters in a narrow build)
to represent a code point.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] String comparison

Reply via email to