On Monday, February 4, 2002, at 07:21 AM, Marco Cimarosti wrote:

> In the on-line UniHan database (http://www.unicode.org/charts/unihan.html)
>  I
> see a field that I have never seen before:
>
>       "-      Other useful dictionary-like data
>               -       [...]
>               -       A phonetic grouping for the character"
>
> The phonetic grouping seems to be an integer number, and I wonder:
>
> - What does this information mean?
>
> - Why some characters don't have it? Is it just missing or it does not 
> apply
> to them?
>
> - Where does it come from? I have not seen a corresponding field in the
> plain-text file UniHan.txt.
>

You need the latest Unihan.txt.  In there you have:

#       kPhonetic*
#               The phonetic index for the character from _Ten Thousand 
Characters: An
#               Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and 
Walsh,
#               1980.

The asterisk indicates that it's a field we're still populating.

> I also take the occasion to suggest a new field that could be very useful:
> the frequency of usage of each character. This information may be derived
> from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research
> (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the
> KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html)
> .
> (I don't know the licensing terms for using these data.)
>
>

We also have a newish kFrequency field.

#       kFrequency
#               A rough fequency measurement for the character based on analysis 
of Chinese
#                       USENET postings

==========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/


Reply via email to