Re: Unicode 7.0 goals and ++

Ken Whistler Mon, 11 Jul 2011 12:13:36 -0700

On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote:

For the long term, I suggest Unicode should aim for this:
Unicode 6.5 should claim: There will be a *Unicode dictionary*,limiting and reducing ambiguous semantics within Unicode(Background: e.g. the word "character" will have one single crispdefinition, /or/ can be specified to & at any special point).

That kind of terminological purity isn't going to occur. The word"character" has beenused ambiguously for decades in the IT industry, and has other generallanguage usage

as well.

The Unicode Consortium has a glossary of terms:

http://www.unicode.org/glossary/

to help clarify technical term usage by the Unicode Standard and otherspecifications, andeverybody is welcome to suggest improvements or additions to it.Specific termsof art in the Unicode Standard, such as "code point", "code unit","scalar value", etc.,are used unambiguously. But it is basically hopeless to try to legislateaway linguistic

ambiguity in a term like "character".

Unicode 7.0 should claim: The Unicode definitions will be in distinct,*abstract layers*.(Background: Unicode is not layered, multiple areas of knowledge mix.Just think of what the 7-layer OSI model has benefited the internetindustry: separating the frequency from the packet from the byte fromthe character. There might be needed more dimensions, like fordetailing normative from informative).


The Unicode Standard already has what abstract layers its architecture
makes appropriate. See, for example, glyph versus character, and
character encoding versus character encoding form versus character
encoding scheme.

But the Unicode Standard is neither a software system nor a protocol stack,
so trying to apply models appropriate to other realms probably isn't going
to get too far.

Unicode 8.0 should claim: Static information will be defined andpublished in *XM*L.(Background: data, so think tables, lists, have one open standardstructure).

This much is *already* available. See UAX #42, Unicode CharacterDatabase in XML,

UTS #22, Character Mapping Markup Language, and UTS #35, Unicode Locale
Data Markup Language. The entire CLDR is expressed in XML already, as
is the entire Unicode Character Database.

Unicode 9.0 should claim: Processes will be defined and published in*UML* 2.0 (for lack of an open standard)
(Background: think UAX #9 Bidi written in a universal -graphic- language).

This, on the other hand, is not going to happen. The Unicode Standard(and the

other specifications of the Unicode Consortium) is not an object-oriented
software system. Even trying to express the algorithmic specifications such
as the Unicode Normalization Algorithm, the Unicode Bidirectional Algorithm,
the Unicode Collation Algorithm, etc., in UML 2.0, would be a major waste of
effort, IMO. I don't see the UTC going for that at all.

--Ken

I might have the numbering wrong, or ever the sequence. But not themain line, is it?
Ernest van den Boogaard
11-Jul-2011

Re: Unicode 7.0 goals and ++

Reply via email to