On Thursday, December 19, 2002, at 06:05 AM, Andrew C. West wrote:
Unicode 4.0 timeframe. We'll also try to get the preferred Mandarin (and possibly Cantonese) readings marked by then.- Any estimates for when it will be possible publish a fixed version?I'll let Mr. Jenkins answer that one.
This is what's weird about this whole thing. I can't figure out how the corruption took place between Unicode 3.0 and 3.1. At least it'll make it easier to fix.- Any suggestion for interim work-arounds (e.g., an older version of the
file, an alternative source)?
Use the Unihan database for Unicode 3.0 at http://www.unicode.org/Public/3.0-Update/Unihan-3.txt This is the latest uncorrupted version.
Meanwhile, one caveat regarding the pronunciations supplied in the Unihan database. While we do try to be accurate and careful and while we do try to use reliable sources, we are not lexicographers ourselves, and there's not much we can do when our sources don't agree. For Mandarin this is a fairly minor problem, but it's a bit more extensive for Cantonese. One cause of this is that languages are moving targets, and the pronunciations themselves can change over time. Another is that sometimes people extrapolate the pronunciation for one dialect from the pronunciation from another, or from the pronunciation given in a classical dictionary such as the KangXi. And, for Cantonese in particular, sometimes characters are new enough that we can't go to dictionaries but have to rely on the "man in the street" for the pronunciation (we had a case like this come up in the last IRG). And sometimes our fingers just trip over each other while we type.
While I think the readings we provide are useful and an important adjunct to the Unihan database, I'm not sure I'd want to use these readings if I were developing a commercial-grade product or writing a scholarly treatise.
==========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

