To add yet another dimension to what Michael & Asmus & Ken have said:
In a character encoding, the character is *not* the same thing as a text string
of length 1.
Character identity is defined in theory by a minimal set of entities needed to
get certain text processes to do the right things ... and in practice by a lot
of blundering around.
Text/sequence equivalence is defined in specific contexts by specific criteria,
under various names from "normalization" to "folding" to "spelling".
In that sense
>The aim of Unicode standardisation is surely to define a single and
>unambiguous representation of text.
is well and truly false. Thus, we can all agree on the letters of the Latin
alphabet for English, abc...xyz -- but we cannot all agree on a single and
unambiguous representation of the word "standardization".
Joe
- In the future, they will invent a chicken that runs on gasoline -- George
Carlin