Re: Errors Chinese pronunciations in Unihan

In Unihan-4.0.1d1b.txt:

U+4C5B  kMandarin       XU4M

The trailing "M" is extraneous.  I do not know about the actual
pronunciation of the U+4C5B character, however.  :-)

The Cantonese pronunciations of characters in CJK Extension A seem
problematic.  There seems to be a _consistent_ (?) mix-up of "AA" and
"A" (long "a" and short "a").  There also seems to be an _occasional_
(?) mix-up of "J" and "Y" (probably due to the confusion between Yale
and Jyutping romanization?).

For example, if U+3400's kDefinition claims that it is same as U+4E18,
then it should be pronounced as "YAU1", not "JAAU1".  (I have no idea
about the "KAAU1" reading.)

U+3558 shows another error.  It is listed as "CHAM1 SAM1".  Here, only
CHAM1 is incorrect; it should be listed as "CHAAM1 SAM1" instead.  SAM1
here means Ginseng.  Hmm... speaking of which, its more conventional
forms (U+53C2, U+53C3, U+53C4) are missing the "SAM1" pronunciation as
well as the corresponding "Ginseng" definition!

On the other hand, some "J"s are correct, e.g. "JUNG3" for U+343A.

Some kCantonese pronunciations are joined together.  For instance, the
following grep command yields:

$ grep kCantonese.*[0-9][A-Z] Unihan-4.0.1d1b.txt 
U+36D3  kCantonese      CHI1HEI1 DOU1
U+36DB  kCantonese      SAAN1DZAAN3
U+3851  kCantonese      HAU1DZIU2
U+3997  kCantonese      GAAM1GAAM3 KAAM4 NAAP1
U+3BA7  kCantonese      WU1WAAT1
U+3C04  kCantonese      JIN1DZIN3
U+3C7E  kCantonese      GOI1HOI1
U+3C80  kCantonese      DAAI2 JAAN1DZEUN1 SAAN4
U+3C8E  kCantonese      DAAU1 LAAU4 SYU1JYU4
U+3CD9  kCantonese      GYUN1JYUN5
U+3DD1  kCantonese      JAAN1 JIN1 SEUNG1NIM6
U+3E62  kCantonese      GA1GO1
U+3F39  kCantonese      HONG1HONG1
U+4003  kCantonese      DEUI1SEUI1 TEUI4
U+4050  kCantonese      JING1JING3
U+4053  kCantonese      JUNG1GAI3
U+4167  kCantonese      JAAM1JAAM3 JIM3
U+4185  kCantonese      CHI4 JI1DAIK1
U+423E  kCantonese      SAU1SOK3 SE3
U+441F  kCantonese      HONG6 NGAAU1GONG2
U+4492  kCantonese      JAAU4 JIU5 SEUI1WAAI2 TIU4
U+44D6  kCantonese      KEA1WU4 KUNG4
U+4543  kCantonese      JAAM1JAAM3
U+4CC9  kCantonese      DUNG1DAM1 DUNG6

I also caught the following error by chance:

U+4C8E  kCantonese      NEOYU5

What is a good place for discussions on these issues?   And which
personnel and which sources are involved with esp. the CJK-Ext-A
kCantonese data?  It would be nice to talk with the original people to
find out how these errors crept in, e.g. errors of the original source? 
Systematic errors due to mistakes in conversion from e.g. Jyutping to
Yale?  Inappropriate use of "Fanqie"?  Other human errors?  etc. so
that we can find a good ways to correct these mistakes.

Furthermore, is there something like CVS web or changelogs to see the
history of modifications of Unihan?  (when, by whom, and why, from what
source, etc.)  What other fixes have been done to Unihan.txt since
19 June 2003?

Many thanks!

Anthony Fok

-- 
Anthony Fok Tung-Ling
ThizLinux Laboratory   <[EMAIL PROTECTED]> http://www.thizlinux.com/
Debian Chinese Project <[EMAIL PROTECTED]>       http://www.debian.org/intl/zh/
Come visit Our Lady of Victory Camp!           http://www.olvc.ab.ca/

Reply via email to