On 2003年10月7日, at 上午9:04, Andrew C. West wrote:


The failings of the Unihan database have been the subject of much discussion in
the past, especially the kMandarin field which got rather mangled in Unicode
3.1. Happily the 4.0.1d1 version of Unihan fixes most of the kMandarin problems,
although the quality of many of the provided Mandarin readings still leaves much
to be desired. (The Mandarin readings really need to be completely overhauled,
based on a single authoritative source such as _Hanyu Da Zidian_ ... but that's
just my personal opinion).



I think it's a reasonable suggestion, but with the usual question when issues about Unihan.txt come up: who's going to do the work?


With Cantonese, of course, we've got a whole other mess to deal with, since there is no single, reasonably authoritative source, and while we're trying to base the Cantonese readings on solid authorities, it isn't hard to come up with instances where they disagree, particularly on the tone. And occasionally we have to resort to the "man in the street" (or the disembodied voice on the Hong Kong subway), since the characters just haven't made it into any dictionary. (E.g., does anyone know how to pronounce U+40DF?)

And the Japanese and Korean readings need to be overhauled as well.

Not to mention the kDefinition field. If nothing else, it needs to be able to distinguish general use, general Chinese, Mandarin, classical Chinese, Cantonese, Japanese, Korean, and Vietnamese usages, plus, of course, other Chinese dialects or non-standard forms.

========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/




Reply via email to