Re: Unicode. Perl does the right thing?

Edward Cherlin Fri, 25 Oct 2002 22:36:07 -0700

As noted below, a uniform alternate naming scheme for CJK characters requires 
a large table, nearly a dictionary. If we accept the size, Jungshik is right 
that there is a way to do it. In each of Chinese, Japanese, and Korean. 
characters in common use can be disambiguated by giving both the 
pronunciation and a widely-known tag name. Here are some examples from Samuel 
Martin's New Korean-English Dictionary (Han-Mi Dae Sajeon), Minjungseogwan 
1968.


U+751F nal saeng (birth, life)
U+7701 teol saeng (diminish)
U+7272 cimsung saeng (animals)
U+7525 saengcil saeng (nephew)

This only works for characters currently used in one of the three languages, 
and requires a different name in each language. There is also the problem of 
characters with more than one common tage name in the same language, where 
someone will have to choose just one for each. Nevertheless, it can be done. 
So we could use the form

CJK UNIFIED IDEOGRAPH NAL SAENG
as an alternative to 
CJK UNIFIED IDEOGRAPH 751F

On Friday 25 October 2002 02:55 pm, Jungshik Shin wrote:
> On Fri, 25 Oct 2002, Autrijus Tang wrote:
> > On Fri, Oct 25, 2002 at 02:53:43PM +0900, Dan Kogai wrote:
> > > use charanames ":zh";
> > > print "\N{sheng1}";
> >
> > 17 characters from the Big5 range has the 'sheng1' pronounciation;
> > no doubt many more in the Unihan range.
> >
> > > use charanames ":zh";
> > > print "\N{saeng}";
>
>   Needless to say, there are many CJK characters with the Korean
> pronunciation 'saeng', let alone  a Korean Hangul syllable with that
> pronunciation. Besides, there are some characters with multiple readings.
> So, this doesn't work for Korean, either.
>
> > This "internal code of Han characters" has been discussed in depth
> > here by Mr Zhu Bang-Fu and friends; the consensus is that there's
> > no way to uniquely identify one character from another depending
> > only on a single 'natural' index (Cang-Jie, pinyin, etc) -- you
> > will end up with fixed ordering ("\N{sheng1-0001}") instead, which
> > is not more legible than "\x{751f}".
>
>   In a sense, it's even worse than "\x{751f}" unless there's a
> machine-readable mapping table (as well as  printed human readable)
> from sheng1-NNNN's to Unicode code points. Otherwise, one  would
> have  to refer  to the Unicode code chart anyway.
>
>   How about radical-stroke-pronunciation index? Even with this
> triple index system, there may be degeneracies to lift....
>
>   Another possibility is 'meaning-pronunciation' index. I believe
> this is one of a few ways to refer to CJK characters (say, over the phone)
> in all CJK countries. However, to do this, we need much more raw data
> (more or less like a small dictionary) than UniHan DB provides because
> it lists meanings of characters in English only.
>
>
>   Jungshik

-- 
Edward Cherlin
Generalist
"A knot!" cried Alice. "Oh, do let me help to undo it."
Alice in Wonderland

Re: Unicode. Perl does the right thing?

Reply via email to