Sigh! Things were a lot easier back in the old days of Unicode version 3, when default grapheme clusters were still called "glyphs". Okay, so the general public still got it wrong, but that was just because they were ignorant monkeys who didn't know any better, and it was up to the likes of us to teach them the right words for things. :-) Now, instead, we'll have to teach them to say "default grapheme cluster". How long do you think it will be before it will be acceptable to describe a console or terminal emultator as being "80 default grapheme clusters wide and 25 default grapheme clusters high"? If I had to guess, I'd say ... never.

Of course, a default grapheme cluster is exactly what Johann was trying to represent in 64 bits in his Excessive Memory Usage Encoding. It's unfortunate that 64 bits just isn't enough for this purpose.

It would be a whole lot easier if Unicode types would only use the same words for things as the rest of the world. I suggest:
(1) A codepoint is still called a codepoint. No problem there.
(2) The object currently called a "character" be renamed as something like "mapped codepoint" or "encoded codepoint", or possibly (coming in from the other end) something like "sub-character" or "character component" or "characterette" (which can be shortened to "charette" and pronounced "carrot". :-) )
(3) The object currently called a "default grapheme cluster" be renamed as "character".
(4) The object currently called a "tailored grapheme cluster" be renamed as "tailored character"


This would make even /our/ conversations a lot less confusing.
Jill






Reply via email to