It occurred to me that I can think of a complaint that I don't think was
mentioned: Unicode's dualism with regard to precomposed and decomposed
forms.
In terms of the ideals that drive Unicode's design principles, decomposed
representations should be prefered: they are adequate for representation,
they provide greater flexibility and less work in getting things into the
standard, they are in many cases the only option provided, and for many
processes decomposed representations either allow for easier processing or
are simply necessary for that processing. But for practical or political
reasons precomposed representations made their way into the standard, and
for practical (but perhaps short-term) reasons they are considered
preferable in Web protocols. They are also far better supported in existing
software. These causes taken together, precomposed representations (where
they exist in Unicode) will be established as the norm for data.
The result is a hodgepodge of precomposed representation (wherever
possible) and decomposed representation (where precomposed is not
available). This makes it more difficult for software implementations to be
done right (allowing for either), and there is a very real risk that
software will work only for data in one or the other representation. (E.g.
today there are software and fonts that can handle, say, polytonic Greek
only if in normal form C.)
But this was a price that needed to be paid in order to make the standard
practical.
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>