Mr. Nohejl,

About the property data you mention below. kRSUnicode property data permits 
multiple/variant (space-delimited) radical/stroke values, and I think we will 
see important variants added in the future. Where a specific value attested in 
a specific Kangxi edition is missing from kRSUnicode, it would indeed be useful 
to add it, and perhaps to give it priority (move it to the front of the list). 
Likewise, if a common variant value is missing (even one not associated with 
Kangxi), it might be added for convenience. And if there are any outright 
errors, of course those should be identified and corrected (but clear errors 
are harder to find these days). 

Note that because kRSUnicode covers *all* Unihan CJK, even those characters not 
present in the original Kangxi, some of the radical/stroke values are so-called 
"virtual" assignments (those should be omitted from consideration, in proofing 
original KX data).

Several years ago we (at Wenlin.com) produced consolidated Kangxi data for our 
Zidian (Wenlin 4.X), taking these four properties (among other data) as input:

<http://www.unicode.org/reports/tr38/#kIRGKangXi>
<http://www.unicode.org/reports/tr38/#kKangXi>
<http://www.unicode.org/reports/tr38/#kRSKangXi> 
<http://www.unicode.org/reports/tr38/#kIRG_GSource>

The last of these may not have any obvious connection with Kangxi, until one 
reads the kIRG_GSource property description and sees this "sub-property" 
description:

"GKX Kangxi Dictionary ideographs (康熙字典) 9th edition (1958) including the 
addendum (康熙字典)補遺"

PRC researchers have done much work proofing G-Source Kangxi data, to address 
many aspects of the complex original text. 

The Kangxi work we did at Wenlin has several dimensions, and some of this has 
not yet rippled back into UCD.

We have in fact already identified many important omissions from kRSUnicode, 
which we plan to propose for a future data release. 

Since kRSUnicode is a Normative property, a formal proposal to modify that data 
is required, for review in WG2. I have added notes on the items you mention 
below, for consideration in that process, and in the meantime, if you identify 
any other issues, please bring them to our attention.

-Richard

PS: About the subject line of your message. Please note that despite the "CJK 
stroke order" subject line in your message, we are not talking about CJK stroke 
order here at all, but about Kangxi and UCS radical assignment, and residual 
stroke *count* data. Such data can indeed be used to "order" (collate) CJK 
data, but "stroke order" is a separate issue, involving the particular sequence 
of CJK Strokes (see The Unicode Standard, Appendix F) in the writing of a given 
character (stroke-order data can also be used for collation and indexing). 
Wenlin's CDL database (which inspired the CJK Stroke block, and also produced 
Appendix F) contains a comprehensive analysis of CJK Stroke order *and* 
Radical/Stroke data for all UCS CJK, primarily focused on PRC norms, but also 
including a great many variants (variants forms, variant stroke counts, and 
variant radical assignments).


On Feb 28, 2014, at 10:56 AM, Adam Nohejl wrote:

> 
> (1) A very common character for "most, maximum".
> 最[U+6700]     kRSKangXi       73.8
> 最[U+6700]     kRSUnicode      13.10
> 
> (2) A funny character for autumn containing the turtle component.
> 龝[U+9F9D]     kRSKangXi       115.16
> 龝[U+9F9D]     kRSKanWa        115.16
> 龝[U+9F9D]     kRSUnicode      213.5
> 
> There are also characters that actually are not included in the Kang Xi 
> dictionary**, but the Unihan data contain both a purported Kang Xi radical 
> and in addition to that a _different_ Unicode radical.
> 
> (3) The simplified turtle character (commonly assigned to the traditional 
> radical #213):
> 亀[U+4E80]     kRSKangXi       213.0
> 亀[U+4E80]     kRSUnicode      5.10
> 
> (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary 
> decision, but unexpectedly the fields differ:
> 曻[U+66FB]     kRSKangXi       72.7
> 曻[U+66FB]     kRSUnicode      73.7


> Hello,
> 
> I am comparing radical data for CJK characters from different sources, 
> including the Unihan database. According to the Unihan documentation* the 
> kRSUnicode radical should correspond to kRSKangXi radical, which in turn 
> should be based on the Kang Xi dictionary.
> 
> Is there any explanation for the following discrepancies? Did I miss any 
> other rules or reasoning behind the content of these two fields?
> 
> Examples of the discrepancies:
> 
> (1) A very common character for "most, maximum".
> U+6700        kRSKangXi       73.8
> U+6700        kRSUnicode      13.10
> 
> (2) A funny character for autumn containing the turtle component.
> U+9F9D        kRSKangXi       115.16
> U+9F9D        kRSKanWa        115.16
> U+9F9D        kRSUnicode      213.5
> 
> There are also characters that actually are not included in the Kang Xi 
> dictionary**, but the Unihan data contain both a purported Kang Xi radical 
> and in addition to that a _different_ Unicode radical.
> 
> (3) The simplified turtle character (commonly assigned to the traditional 
> radical #213):
> U+4E80        kRSKangXi       213.0
> U+4E80        kRSUnicode      5.10
> 
> (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary 
> decision, but unexpectedly the fields differ:
> U+66FB        kRSKangXi       72.7
> U+66FB        kRSUnicode      73.7
> 
> - - -
> 
> [*] <http://www.unicode.org/reports/tr38/tr38-8.html>: "Property: kRSUnicode 
> // Description: (...) The first value is intended to reflect the same radical 
> as the kRSKangXi field and the stroke count of the glyph used to print the 
> character within the Unicode Standard."
> 
> [**] The two characters are missing from the '89 edition of Kang Xi (which 
> should be the same as used for Unihan) according to search on this site: 
> <http://ctext.org/dictionary.pl>
> 
> 
> -- 
> Adam Nohejl
> 
> 
> _______________________________________________
> Unicode mailing list
> [email protected]
> http://unicode.org/mailman/listinfo/unicode


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to