On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote:
For the long term, I suggest Unicode should aim for this:

Unicode 6.5 should claim: There will be a *Unicode dictionary*, limiting and reducing ambiguous semantics within Unicode (Background: e.g. the word "character" will have one single crisp definition, /or/ can be specified to & at any special point).

That kind of terminological purity isn't going to occur. The word "character" has been used ambiguously for decades in the IT industry, and has other general language usage
as well.

The Unicode Consortium has a glossary of terms:

http://www.unicode.org/glossary/

to help clarify technical term usage by the Unicode Standard and other specifications, and everybody is welcome to suggest improvements or additions to it. Specific terms of art in the Unicode Standard, such as "code point", "code unit", "scalar value", etc., are used unambiguously. But it is basically hopeless to try to legislate away linguistic
ambiguity in a term like "character".


Unicode 7.0 should claim: The Unicode definitions will be in distinct, *abstract layers*. (Background: Unicode is not layered, multiple areas of knowledge mix. Just think of what the 7-layer OSI model has benefited the internet industry: separating the frequency from the packet from the byte from the character. There might be needed more dimensions, like for detailing normative from informative).

The Unicode Standard already has what abstract layers its architecture
makes appropriate. See, for example, glyph versus character, and
character encoding versus character encoding form versus character
encoding scheme.

But the Unicode Standard is neither a software system nor a protocol stack,
so trying to apply models appropriate to other realms probably isn't going
to get too far.


Unicode 8.0 should claim: Static information will be defined and published in *XM*L. (Background: data, so think tables, lists, have one open standard structure).

This much is *already* available. See UAX #42, Unicode Character Database in XML,
UTS #22, Character Mapping Markup Language, and UTS #35, Unicode Locale
Data Markup Language. The entire CLDR is expressed in XML already, as
is the entire Unicode Character Database.

Unicode 9.0 should claim: Processes will be defined and published in *UML* 2.0 (for lack of an open standard)
(Background: think UAX #9 Bidi written in a universal -graphic- language).

This, on the other hand, is not going to happen. The Unicode Standard (and the
other specifications of the Unicode Consortium) is not an object-oriented
software system. Even trying to express the algorithmic specifications such
as the Unicode Normalization Algorithm, the Unicode Bidirectional Algorithm,
the Unicode Collation Algorithm, etc., in UML 2.0, would be a major waste of
effort, IMO. I don't see the UTC going for that at all.

--Ken


I might have the numbering wrong, or ever the sequence. But not the main line, is it?

Ernest van den Boogaard
11-Jul-2011

Reply via email to