Re: Unicode 7.0 goals and ++

Jukka K. Korpela Mon, 11 Jul 2011 23:14:43 -0700

2011-07-11 21:57, Ken Whistler wrote:

On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote:

For the long term, I suggest Unicode should aim for this:


Unicode 6.5 should claim: There will be a *Unicode dictionary*,
limiting and reducing ambiguous semantics within Unicode
(Background: e.g. the word "character" will have one single crisp
definition, /or/ can be specified to & at any special point).


That kind of terminological purity isn't going to occur.

That's possible, even probable, if people who could do the clarificationdon't want to do it.


> The word "character" has been

used ambiguously for decades in the IT industry, and has other general
language usage as well.

So do many other words, too. Terminology isn't about changing themeanings of words in everyday language. It's about defining terms,perhaps using common-language words but assigning technical meanings tothem.

The Unicode Consortium has a glossary of terms:

http://www.unicode.org/glossary/

Yes, and it's mostly useful and well-written. But the "definition" forcharacter is really a mess. For example, "(1) The smallest component ofwritten language that has semantic value" doesn't make sense. What isthe semantic value of the letter "e"? Does that definition answer thequestion whether "é" is one character or two?

"Abstract character" is even worse. "A unit of information used for theorganization, control, or representation of textual data." So a bit is acharacter, isn't it?

But it is basically hopeless to try to legislate away linguistic
ambiguity in a term like "character".


You're not referring to "character" as a term; rather, as a word in English.

I think part of the problem is that Unicode has widely beenmisrepresented as providing a unique number (code point) for everycharacter (see e.g. http://www.unicode.org/standard/WhatIsUnicode.html), and it is difficult to take back such statements - which are animportant part of Unicode evangelism. We can keep saying it only if theword "character" is used loosely enough. The statement is effectively atruism: Unicode has a unique number for every code point designated as acharacter code point (and for other code points, too, of course).


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: Unicode 7.0 goals and ++

Reply via email to