Well, I've added support for the remaining few fields, and while at it upgraded to Unihan 6.0.0 which is just out, and made quite a few other improvements. The only remaining piece of data not handled is a single line in each of kFenn and kHKGlyph, including two entries instead of one, so I wasn't sure whether this is intentional or not. Nice to see some of the questionable entries (1- or 2-character kDefinition values, so far) have been fixed already in the new Unihan version :) Regards, Uriah
---------- Forwarded message ---------- From: Uriah Eisenstein <[email protected]> Date: Thu, Sep 30, 2010 at 8:48 PM Subject: Fwd: Unihan SQL access To: unicode List <[email protected]> As usual this took longer than I thought... But an initial version is finally ready, and can be found in http://babelfish.50webs.com/unihan-sql-browser/Unihan%20SQL%20Browser.html. It requires access to the Unihan.zip file and a JDBC driver; there are explanations on the web page which I hope would be enough. Quite a few improvements are already planned... I'd be glad to hear anyone finds it useful. While at it, I found a couple of apparent typos in the source indications of variants (using SELECT DISTINCT SOURCE FROM VARIANT_SOURCE). These all come from the kSemanticVariant field: SELECT * FROM kSemanticVariant_source WHERE kSemanticVariant_source IN ('kMathews', 'kMeterWempe') [U+3C92] 勽 [U+52FD] kMathews 勽 [U+52FD] [U+3C92] kMathews [U+25500] 渹 [U+6E39] kMeterWempe Regards, Uriah Eisenstein ---------- Forwarded message ---------- From: Uriah Eisenstein <[email protected]> Date: Sun, Sep 12, 2010 at 5:57 PM Subject: Unihan SQL access To: unicode List <[email protected]> Hello, I'm nearing completion of a simple Java program which loads Unihan data from the source files into a DB, and provides SQL access to it.There's still at least a week or so of work on issues I consider essential, but once ready I'd be happy to make it available on the Internet if anyone's interested. So far I've used it to search for possibly erroneous data in Unihan; my latest find is that 73 characters have a kTaiwanTelegraph value of 0000, which seems doubtful. It can also be useful for various statistical information such as how many characters are listed under each radical, or which blocks include IICore characters. I'm also considering adding the contents of the Unicode Character Database as well at a later phase. Regards, Uriah Eisenstein

