On Sun, 29 Aug 2010 14:07:35 +0200 Uriah Eisenstein <[email protected]> wrote: >UAX #38 (Unihan) defines the kIRG_USource field as a reference into the >U-source ideograph database described in UTR #45, having the form "UTC >nnnnn". However, several CJK Compatibility Ideographs are mapped to their >own code point values, e.g. "U+FA0C kIRG_USource U+FA0C". The formal >syntax of kIRG_USource allows this, but I've found no explanation as to the >meaning of such a mapping; there is also no such mapping from a code point >to another code point.
I think it's good pointing out. U+FA0C was originally introduced for the round trip conversion with ISO/IEC 10646 versus Big5, but it's slightly difficult to know such background from the properties in current Unihan.txt. U+FA0C is still easier example to understand, because its kDefinition mentions about it. U+FA0D is also introduced for the compatibility with Big5, but does not say such. Recently, CJK compatibility ideographs are proposed to assign the codepoints for the "characters" whose shape differences are unifiable with existing characters. And U+F900 - U+FA0B for KS X 1001:1998 compatibility and U+FA0C - U+FA0D for Big5 compatibility are exceptional because their glyph shapes have exactly no differences with existing characters. Some people expect such info. For compatibility characters with subtle differences in their shapes, I'm not sure if the historical back ground is needed /or not. The compatibility ideographs introduced for IBM Kanji for Japanese markets have subtle differences with the exemplification glyphs in Japanese industrial standards when IBM developed them. But, in later, newer Japanese industrial standards recognized that some of them are reasonable to be coded at different code points. Therefore, Unihan.txt lists such properties: U+FA0F kIBMJapan FA9B U+FA0F kIRG_JSource 3-2F4B U+FA0F kIRG_USource U+FA0F U+FA0F kJIS0213 1,15,43 U+FA0F kRSAdobe_Japan1_6 C+8421+32.3.7 C+8421+150.7.3 I'm not sure if all possible variants for JIS X 0213 can be recognized with "compatible with IBMJapan". # I slipped to check who provided the font to print the # characters introduced for IBM Kanji in ISO/IEC 10646. Uriah, do you think historical background info about each compatibility ideographs should be noted in Unihan.txt? Regards, mpsuzuki

