I don't know about the points you raise, but I wish it was easier to help proofread Unihan data. Back in 2012 I compared kKangXi to kIRGKangXI and found 252 conflicts, besides the cases where a character only has one or the other. I even put together a simple tool to help fixing this, with links to the relevant pages at the online Kang Xi[1]. I had no replies…
[1] http://namakajiri.net/misc/unihan_kangxi/compare_existing.html for characters in Kang Xi, and for the others, http://namakajiri.net/misc/unihan_kangxi/compare_nonexisting.html 2014-03-09 9:39 GMT-03:00 Adam Nohejl <[email protected]>: > Hello again, > > I would be really grateful for any reply or at least pointers to relevant > information about this topic (stroke-order data in Unihan, see my previous > message below). > > Or is there any other appropriate place to discuss this? > > Thank you, > > -- > Adam > > On 2014/02/28, at 19:56, Adam Nohejl <[email protected]> wrote: > > > > Hello, > > > > I am comparing radical data for CJK characters from different sources, > including the Unihan database. According to the Unihan documentation* the > kRSUnicode radical should correspond to kRSKangXi radical, which in turn > should be based on the Kang Xi dictionary. > > > > Is there any explanation for the following discrepancies? Did I miss any > other rules or reasoning behind the content of these two fields? > > > > Examples of the discrepancies: > > > > (1) A very common character for "most, maximum". > > U+6700 kRSKangXi 73.8 > > U+6700 kRSUnicode 13.10 > > > > (2) A funny character for autumn containing the turtle component. > > U+9F9D kRSKangXi 115.16 > > U+9F9D kRSKanWa 115.16 > > U+9F9D kRSUnicode 213.5 > > > > There are also characters that actually are not included in the Kang Xi > dictionary**, but the Unihan data contain both a purported Kang Xi radical > and in addition to that a _different_ Unicode radical. > > > > (3) The simplified turtle character (commonly assigned to the > traditional radical #213): > > U+4E80 kRSKangXi 213.0 > > U+4E80 kRSUnicode 5.10 > > > > (4) Character with the radical #72/73 at the top, i.e. IMHO an arbitrary > decision, but unexpectedly the fields differ: > > U+66FB kRSKangXi 72.7 > > U+66FB kRSUnicode 73.7 > > > > - - - > > > > [*] <http://www.unicode.org/reports/tr38/tr38-8.html>: "Property: > kRSUnicode // Description: (...) The first value is intended to reflect the > same radical as the kRSKangXi field and the stroke count of the glyph used > to print the character within the Unicode Standard." > > > > [**] The two characters are missing from the '89 edition of Kang Xi > (which should be the same as used for Unihan) according to search on this > site: <http://ctext.org/dictionary.pl> > > > > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

