kFrequency.

John H. Jenkins Tue, 25 Feb 2003 08:18:10 -0800

On Sunday, February 23, 2003, at 08:50 AM, Pierpaolo BERNARDI wrote:

In the Unihan-3.2.0.txt file the field kKarlgren is described as:

#  The index of this character in _Analytic Dictionary of Chinese and
#   Sino-Japanese_ by Bernhard Karlgren, New York: Dover Publications,
#   Inc., 1974.
#  If the index is followed by an asterisk (*), then the index is an
#   interpolated one, indicating where the character would be found
#   if it were to have been included in the dictionary.

However, in the file there are the following records:

U+5374 kKarlgren 506A
U+630C kKarlgren 411A
U+811A kKarlgren 506A
U+8173 kKarlgren 506A
U+993C kKarlgren 333A-

So, either the description of the field is incomplete, or the data
is incorrect.

If you check Karlgren's dictionary, you'll find that while most of the indices are integers, there are some indices which are integers followed by an "A". This is common in many East Asian dictionaries with a numerical order; it typically happens when the basic numeric indices are assigned and then an out-of-order entry is discovered. In such a case, rather than reset all the indices, an interpolated index is added.

----------------------------------------------------

The field kFrequency is described as:

#  A rough fequency [sic] measurement for the character based
#  on analysis of Chinese USENET postings

without further explanation. The field contains one of 1,2,3,4,5.
I'd like to know what's, roughly, the meaning of these numbers.

Roughly, characters with a frequency of 1 are more commonly used than those with a frequency of 2, and so on.

==========
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/

Re: Unihan DB / kKarlgren / kFrequency.

Reply via email to