Re: Synthetic scripts (was: Re: Private Use Agreements and Unapproved Characters)

Dan Kogai Fri, 15 Mar 2002 15:52:35 -0800

On Saturday, March 16, 2002, at 07:27 , Kenneth Whistler wrote:
> *What* still holds true? These are just well-worn issues of itaiji
> (variant forms). The characters from the little anime exhibit of
> variants are, in Unicode:
>
> U+9AD8 / U+9AD9
> U+5516 / U+555E
> U+9593 / U+9592
>
> all variants of the same character that got cloned into Unicode
> because of the source separation rule.


   This is an opinion by liguists but the problem is the government takes 
it otherwise.  Itaiji or not, once registered by the government, that 
character becomes canonical and must be used in any legal document.
   When I started a company I had to file a registration to Hou-mu-kyoku, 
or Legal Registory Office.  I naturally compiled documents with a text 
editor but one of the board member's name contained Itaiji so I had to 
blank out that part and handwrite after the documents are printed out.

> And the last one is U+5409 "kichi". For this one, I believe the
> variant is simply a zokuji ("vulgar variant") not recognized as
> standard in the dictionaries. But it is just one of thousands
> of similar variant forms which could be attested for itaiji.

   U+5409 kichi is zokuji AND legal.

> The whole issue of Han variant forms, by the way, is not something
> that the Unicode Standard created, nor did Han encoding unification
> principles in Unicode and 10646 somehow exacerbate the problem for
> IT processing.

   Right.  But it is also true that Unicode way of Unifying characters 
stand in a way in so many cases when you attempt to put Unicode into 
practice.  As Kato pointed out,  Unicode is more pro-programmers than 
pro-users.

> But of course that begs the question of what presentation variation
> detail he or other users perceive to be spelling differences. Correct
> presentation of all details of Han characters may not *be* the
> business of the character encoding per se. There is an architectural
> decision to be made regarding the tradeoff between the identity of
> characters for processing purposes and the appearance of characters
> for rendering purposes, and Kato-san and the IRG appear to disagree
> about where that line should be drawn.

   Right.  I don't know where the line should be drawn either.  But the 
bottom line is that the name should be considered different characters, 
not different variation of the same character because this directory 
bounds to legal documents.  I want, ok, hope, ok, wish Unicode to be 
encode legal documents in plain text.

>> favorite appears to be ISO-2022 but as Yet Another Perl Encoding 
>> Hacker,
>> ISO-2022 is pain in the arse....
>
> You got that right!

   But when it comes to allocating new character set, ISO-2022 wins 
because the authority has to authorize only escape sequence to the new 
character set and leave the rest up to the user.

Dan the Lucky Man Whose Name is Encodable by Unicode

Re: Synthetic scripts (was: Re: Private Use Agreements and Unapproved Characters)

Reply via email to