Re: TC/SC mapping

John H. Jenkins Thu, 24 Jan 2002 12:49:29 -0800


On Thursday, January 24, 2002, at 12:29 PM, John Cowan wrote:

> John H. Jenkins wrote:
>
> {TC1, SC1, SC2, TC2, TC3, SC3} constitute a "Han simplification
> class" (HSC), and are all the same when appearing in IDNs.
>
> Correct?
>

Oui.

>
>> The caveat is that this must be understood to be a first-order, 
>> computer-appropriate equivalence and is not in any way to be held to be 
>> a generalized solution to the lexically appropriate conversion between 
>> SC and TC.
>
>
> Is there any danger that these classes will turn out to be a
> "small world", in the sense that we wind up with a few huge classes
> which include almost all the characters?
>

Nope.

>> (Maybe we should refer to *zhengguihua* instead of "Han normalization"�)
>
>
> Can you explain the joke?
>

It's just to make Ken happy.  He doesn't like me talking about "Han 
normalization," since "normalization" is Unicodespeak for something else.  
"Zhengguihua" is Mandarin for "normalization."

>> It will also mean that we will no longer be able to accept both the TC 
>> and SC form for a character as a candidate for separate encoding in the 
>> future,
>
>
> I don't understand this part.  Since this is neither compatibility nor
> canonical equivalence, it will not effect any of the known normalization
> forms.  Nor are we defining a new normalization form here, since in
> HSCs like the above there is no particular reason to pick any of the
> six characters as *the* normalized form, although by convention we can
> pick one -- say, the one with the smallest Unicode scalar
> value, or the one which appears in the largest number of legacy
> sets -- to aid in description and implementation.
>
> It's just another of those sets of equivalence classes provided for
> special purposes, like the Arabic/Syriac shaping classes or the
> canonical combining classes.
>

Well, first of all, the UTC is already on record as refusing to encode new 
SC separately.

Secondly, we would break IDN equivalence.  If we add a new SC which is 
equivalent to two TC, then suddenly domains which could be distinguished 
on the basis of the old TC pair can't any more.

> Or are you saying that this new information should be represented
> as a Unicode compatibility equivalence?  If so, that would
> wreak havoc with existing NCF and NKCF code.
>

No,

>> (Actually, you could save yourself some grief right off by excluding Han 
>> radicals and all compatibility ideographs.)
>
> This would be a Bad Thing in Korean, though, because the whole point
> of Korean compatibility ideographs is to preserve differences in
> reading.  Or are ideographs not used in (modern) Korean names?
>

These compatibility ideographs are *not* to provide phonetic-specific 
distinctions between various Korean hanja.  They're for compatibility with 
an older standard only, which did make that distinction.  IMHO it would be 
more confusing to Chinese, Japanese, *and* Korean readers to have some 
domain names distinguished when the the only thing different about them is 
the Korean pronunciation of the hanja used to write them.

==========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/

Re: TC/SC mapping

Reply via email to