Re: Synthetic scripts (was: Re: Private Use Agreements and Unappr oved Characters)

Dan Kogai Fri, 15 Mar 2002 09:46:27 -0800

On Friday, March 15, 2002, at 08:48 , Marco Cimarosti wrote:
> O, no! At least one of them has a (super)natural origin: CJK ideographs 
> came
> carved on the shell of a gigantic turtle which appeared in dream to Cang
> Jie. :-)


   That reminds me of a fact that Hanzi (or Kanji in Japanese) is 
equipped with capacity to generate new character simply by combining 
'radicals' (or 'Bushu' in Japanese).  Put 'heart' (心) next to 'life' 
(生) and you will get 'sex' (性), for instance.
   Unlike roman characters that are relatively static, Kanji is very 
dynamic when it comes to characters.  So I can't help asking you guys 
this question;  How will Unicode cope with this kind of dynamically 
changing character set?
   So far Kanji users get by with a limited set of encoded character 
sets, not because they are content with the current set but because it 
is so hard to push one character into the current set.  When Japan 
Industrial Standard (JIS) upgraded JISX0208 (first one fixed in 1978.  
aka Old JIS) in 1990  (New JIS), it created a big chaos.  And new chaos 
is subject to arise with JISX0212-1990 upgraded to JISX0213-2000.
   You may say this can be resolved by regarding each Kanji not as a 
character but a word (lexically speaking this does make sense) then use 
some sort of ligature to represent one.  That way you can reduce the 
number of code point down to the number of Bushu.
   But this approach has already failed when Unicode 2.0 decided to give 
all theoretically possible Hangul distinct code points, unlike Unicode 
1.0 which used ligature model to represent one char.  As a result Hangul 
now even has more code points than Traditional Chinese.  With this 
Unicode Consortium has lost a good reason to reject new proposal to add 
more characters.  If elvish get the code points why not real, alive 
language get more?
   CJK has made the greatest compromise -- the compromise that hardly 
paid off in consequence -- when Unicode was first created.  They 
accepted the code point sharing though that hardly make sense 
linguistically.  Then Unicode 2.0 and Hangul Expansion, then Surrogate 
Pair.  What's next?  Making Unicode 128 bit like IPv6 address so you can 
include Tengwar and Klingon with less objection?  I can't help but say 
give me a break.
   I confess I enjoyed this thread of whether Tengwar should be include 
in Unicode.   It's fun.  It's cute.  But isn't this too much for those 
who accepted the compromise for UNIcode?  Tengwar should wait till more 
critical issues are resolved.  Many (including me ) would be pissed if 
Tengwar be added BEFORE Ciao-Ciao's poetries and Man-Yo-Shu become 
encodable in Unicode.
   Well, it may take decades, if not centuries, for Tengwar, Klingon and 
others to get a chance but so what?  They won't go away after all of us 
here are dead.

Dan the Man with Too Many Things to Encode Already

Re: Synthetic scripts (was: Re: Private Use Agreements and Unappr oved Characters)

Reply via email to