On Saturday, March 30, 2002, at 04:44 , Jarkko Hietaniemi wrote: > Gentlemen, you may want to read Unicode 3.2 > ( http://www.unicode.org/unicode/reports/tr28/ ) It does say something > about Han, Katakana, and Hangul (sections 10.1, 10.3, and 10.4). (No, > I don't know what happened to 10.2). What I'm after is whether the > said CJK changes affect Encode?
For Japanese, I pretty much doubt it, at least for the time being. JIS X 0213:2000, as you see, is only two years old and encodings that support are not popular -- yet. The support will take a form of ADDITION, not MODIFICATION, at least so long as JIS X 0213 is concerned. But let me post a summery of (proposed) encodings for JIS X 0213 for the record. (See also http://www.asahi-net.or.jp/~wq6k-yn/code/enc-x0213.html if your browser supports Japanese) JIS X 0213 ========== Is; tidy (JIS X 0208 + JIS X0212). It consists of two 94x94 planes. plane 1 corresponds to 0208 and 0212. But some of the code points are rearranged so 0213-1 != 0208 and 0213-2 != 0208 EUC-JISX0213 ============ Encoding scheme is the same as EUC-JP. Here is the diagram G0 US-ASCC G1 JISX0213-1 (G2 JISX0201-kana (depreciated)) G3 JISX0213-2 Technical difficulty is minimum. All I need is a table. I may make a UCM out of Unihan DB and post it to something like Encode::JPExtra or something. When in use, this encoding supersedes EUC-JP because you can't tell the difference by looking at a given string. You must explicitly set your encoding to this or "classical" EUC-JP ISO-2022-JP-3 ============= Basically This one is ISO-2022-JP with new escape sequences. Esc. Seq. Charset ------------------------ ESC $(O JISX0213-1 ESC $(P JISX0213-2 This one is easy, too. Unlike EUC-JISX0213, this one EXTENDS ISO-2022-JP and old 0208/0212 and 0213 can coexist, thanks to escape sequences. Shift_JISX0213 ============== And the most controversial one. This one squeezes what was not used in Shift_JIS. Shift_JIS was already acrobatic and this one is a nightmare. However, this one also has only 2 bytes max so the support for this is not that hard. But unlike the cases above, I need UTF-8 => Shift_JISX0213 mapping instead of vanilla JISX0213, which I am not sure if it is available. I'll look into it. As for Hangul. I'll let the experts like Jungshik review the impact.... Dan the Man with Even more Encodings