On Monday, February 4, 2002, at 07:21 AM, Marco Cimarosti wrote:
> In the on-line UniHan database (http://www.unicode.org/charts/unihan.html) > I > see a field that I have never seen before: > > "- Other useful dictionary-like data > - [...] > - A phonetic grouping for the character" > > The phonetic grouping seems to be an integer number, and I wonder: > > - What does this information mean? > > - Why some characters don't have it? Is it just missing or it does not > apply > to them? > > - Where does it come from? I have not seen a corresponding field in the > plain-text file UniHan.txt. > You need the latest Unihan.txt. In there you have: # kPhonetic* # The phonetic index for the character from _Ten Thousand Characters: An # Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and Walsh, # 1980. The asterisk indicates that it's a field we're still populating. > I also take the occasion to suggest a new field that could be very useful: > the frequency of usage of each character. This information may be derived > from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research > (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the > KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html) > . > (I don't know the licensing terms for using these data.) > > We also have a newish kFrequency field. # kFrequency # A rough fequency measurement for the character based on analysis of Chinese # USENET postings ========== John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/