Re: [Python-3000] String comparison

Stephen J. Turnbull Fri, 08 Jun 2007 01:11:40 -0700

Guido van Rossum writes:

 > If you want to have an abstraction that guarantees you'll never see
 > an unnormalized text string you should design a library for doing so.


OK.

 > (*) It looks like such a library will not have a way to talk about
 > "\u0308" at all, since it is considered unnormalized.

>From the Unicode Standard, v4.0, p. 43: "In the Unicode Standard, all
sequences of character codes are permitted."  Since normalization only
applies to characters with decompositions, "\u0308" is indeed valid
Unicode, a one-character sequence in NFC.

AFAIK, the only strings the Unicode standard absolutely prohibits
emitting are those containing code points guaranteed not to be
characters by the standard.  And normalization is simply a internal
technique that allows text operations to be implemented code-point-
wise without fear that emitting them would result in illegal sequences
or other externally visible incompatibilities with the standard.

So there's nothing "wrong by definition" about defining strings as
sequences of code points, and string operations in code-point-wise
fashion.  It just makes that library for Unicode more expensive to
design and operate, and will require auditing and reimplementation of
common libraries (including the standard library) by every program
that requires strict Unicode conformance.

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] String comparison

Reply via email to