Re: Unicode under fire again

DougEwell2 Wed, 06 Jun 2001 00:57:32 -0700
> http://www.hastingsresearch.com/net/04-unicode-limitations.shtml

I decided to be courteous this time and let others burn this article to a 
crisp before stepping in to blow away the ashes.

There's something rewarding about reading an anti-Unicode article that 
starts, in the first paragraph, by saying that Unicode is "a 16-bit character 
definition allowing a theoretical total of over 65,000 characters."  That 
tells me right away how much accuracy to expect in the rest of the article.  
(I first read about surrogates around 1993.)

The paper promises to discuss "the political turmoil and technical 
incompatibilities that are beginning to manifest themselves on the Internet" 
because of the supposed inadequacy of Unicode, but no evidence is ever shown 
that such turmoil and incompatibilities actually exist.  We are simply asked 
to assume that they exist because of the supposed inadequacy.  This is a 
circular argument.

We learn that "Unicode's stated purpose is to allow a formalized font system 
to be generated."  Font system?  Did you know that?

John Cowan has already discussed the scurrilous claim that no Chinese, 
Taiwanese, Koreans, or Japanese were consulted on the design of Unicode 1.0.

The claim of 170,000 total characters is based on disunification of all Han 
characters.  This is simply a disagreement in design between the Unicode 
approach and Goundry.  The decision to unify a traditional Han character 
written with different typographical conventions, instead of coding three 
separate versions for Chinese, Japanese, and Korean, does not mean that two 
characters are somehow missing.

Furthermore, even if 170,000 discrete characters are necessary -- they may 
be, for all I know -- the premise that the Han repertoire is not completely 
specified is presented as evidence not that Unicode is a work in progress, 
but that it willfully ignores the needs of speakers of East Asian languages.  
This is not merely inaccurate, it is irresponsible.

Unicode 3.1 is dismissed with a handwave, on the basis that "two separate 
16-bit blocks do not solve the problem [of inadequate repertoire] at all."  
No technical (or other) justification is attempted to explain why all Han 
characters must appear in a single contiguous block.

The "analogy" of deleting 25 percent of the Latin alphabet or 75 percent of 
the English vocabulary is completely pointless.  By Goundry's own admission, 
the Chinese writing system is in constant flux, whereas the Latin alphabet 
(and others) has been fixed for centuries.  And the remarks about the French 
being forced to use "the German alphabet" or the English using "a French 
alphabet" left me laughing.  The Latin alphabet, designed for the Latin 
language, was intentionally borrowed by the English, the French, the Germans, 
and speakers of dozens of other languages.

The passage about Verisign is completely irrelevant to the rest of the 
article, to the point where I wondered if it had been pasted in by accident.

After all this, the question left for me to ponder was this:  If Unicode does 
not solve the problem of adequately encoding Han characters, then what 
character set does?  EUC-JP?  Big 5?  GB2312?  Finally another list member 
mentioned the (grammatically sound) reference, "Hastings been experimenting 
with workarounds."  Somehow I am not left shuddering with fear at the 
impending demise of Unicode.

-Doug Ewell
 Fullerton, California
Re: Unicode under fire again

Reply via email to