On 24/07/13 03:01, Marc Tompkins wrote:
On Tue, Jul 23, 2013 at 7:46 AM, Steven D'Aprano <st...@pearwood.info>wrote:

This is not quite as silly as saying that an English E, a German E and a
French E should be considered three distinct characters, but (in my
opinion) not far off it.


I half-agree, half-disagree.  It's true that the letter "E" is used
more-or-less the same in English, French, and German; after all, they all
use what's called the "Latin" alphabet, albeit with local variations.  On
the other hand, the Cyrillic alphabet contains several letters that are
visually identical to their Latin equivalents, but used quite differently -
so it's quite appropriate that they're considered different letters, and
even a different alphabet.

Correct. Even if they were the same, if legacy encoding systems treated them differently, so would 
Unicode. For example, \N{DIGIT FOUR} and \N{FULLWIDTH DIGIT FOUR} have distinct code-points, even 
though they are exactly the same character, since some legacy East-Asian encodings had separate 
characters for "full-width" and "half-width" forms.

But I confess I have misled you. I wrote about the CJK controversy from memory, 
and I'm afraid I got it completely backwards: the problem is that the glyphs 
(images of the characters) are different, but not the meaning. Mea culpa.

For example, in English, we can draw the dollar sign $ in two distinct ways, with one 
vertical line, or two. Unicode treats them as the same character (as do English 
speakers). "Han Unification" refers to Unicode's choice to do the same for many 
Han (Chinese, Korean, Japanese) ideographs with different appearance but the same 
meaning. For various reasons, some technical, some social, this choice proved to be 
unpopular, particularly in Japan. This issue is nothing new -- Unicode supports about 
71,000 distinct East Asian ideographs, which is *far* more than the old legacy encodings 
were capable of representing, so if there is a Han character that you would like to write 
which Unicode doesn't support, chances are that neither does any other encoding system.

More here:

https://en.wikipedia.org/wiki/Han_unification
http://www.unicode.org/faq/han_cjk.html
http://slashdot.org/story/01/06/06/0132203/why-unicode-will-work-on-the-internet



--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to