On Mon, 29 Apr 2002 15:45:09 +0900 Dan Kogai <[EMAIL PROTECTED]> wrote:
> Sadahiro-san and perl-unicode readers, > > I am now working on Encode::JIS2K, an additional converter for JIS X > 0213:2000. When I studied JIS X 0213, I found that for euc-jp, you can > make a map so that it covers both JIS X 0212 and JIS X 0213. I thought > they were mutually exclusive but they were not (there are some > duplicates, however. So it was not as straightforward as aggregating > two maps). Excellent. I'd like to give some additional explanations. As shown below, JIS X 0213:2000 plane 2 and JIS X 0212:1997 don't overlap on their KU-TEN (row-cell). (Rows marked with * mean they bear Kanji [CJK ideographs].) (Noteworthy, <the Non-Kanji part> of JIS X 0212:1997 also has no overlap with JIS X 0208:1997.) JIS X 0208:1997 cells defined row 1: 1..94. row 2: 1..14, 26..33, 42..48, 60..74, 82..89, 94. row 3: 16..25, 33..58, 65..90. row 4: 1..83. row 5: 1..86. row 6: 1..24, 33..56. row 7: 1..33, 49..81. row 8: 1..32. *rows 16..46: 1..94. *row 47: 1..51. *rows 48..83: 1..94. *row 64: 1..6. JIS X 0212:1997 cells defined row 2: 15..25, 34..36, 75..81. row 6: 65..69, 71, 73..74, 76, 81..92. row 7: 34..46, 82..94. row 9: 1..2, 4, 6, 8..9, 11..13, 15..16, 33..48. row 10: 1..24, 26..87. row 11: 1..27, 29..35, 37..87. *rows 16..76: 1..94. *row 77: 1..67. JIS X 0213:2000 plane 2 cells defined *row 1: 1..94. *rows 3..5: 1..94. *row 8: 1..94. *rows 12..15: 1..94. *rows 78..93: 1..94. *row 94: 1..86. > I have just finished making new euc-jp.ucm that behaves like this; > > for euc-jp, > * Round-Trips for all JIS X 0201-kana, JIS X 0208 and JIS X 0212 (same > as before) > * Decode-only for those that appear only in JIS X 0213 I doubt whether users of 'euc-jp' will assume it to be a combination with JIS X 0213. Such a mixing would prevent warning/croaking for appearance of code points that are not defined originally (meaning w/o X 0213), wouldn't it? EUC-JP is not defined as including JIS X 0213, and EUC-JISX0213 is not specified it includes JIS X 0212. (exactly speaking, JIS does exclude JIS X 0201 kana from EUC-JISX0213.) As the article 6.3, the explanation (`kaisetsu') of JIS X 0213:2000 mentioned, overlapping of JIS X 0212 and JIS X 0213 plane 2 has been avoided by design, since they both should be used in G3 in the EUC scheme, so that it should help to tell EUC-JP from EUC-JISX0213 and vice versa; but it should not intend to make the G3 set a mixed bag with X 0213 p2 with X 0212. IMO, if you must need provide a mixture of JIS X 0213 with JIS X 0212, it should be better to be under another name than EUC-JP nor EUC-JISX0213. > Remind you that this new euc-jp.ucm is NOT THE SAME as euc-jp2k.ucm that > is to be included in Encode::JIS2K; > > for euc-jisx0213, > * Round-Trips for all JIS X 0201-kana and JIS X 0213 (both planes) > * Decode-only for those that appear only in JIS X 0212 > * Those that conflict with JIS X 0208 and JIS X 0213-plane1, JIS X 0213 > definition is used. Only these 3 are different (so JIS X 0213-plane1 > is ALMOST a superset of JIS X 0208). > > euc-jp > <UFFE3> \xA1\xB1 |0 # FULLWIDTH MACRON > <U2015> \xA1\xBD |0 # HORIZONTAL BAR > <UFFE5> \xA1\xEF |0 # FULLWIDTH YEN SIGN > > euc-jisx0213 > <U203E> \xA1\xB1 |0 # OVERLINE > <U2014> \xA1\xBD |0 # EM DASH > <U00A5> \xA1\xEF |0 # YEN SIGN > > In short, euc-jp and euc-jisx0213 differ only in encode() and decoders > can decode both euc-jp(1990) and euc-jisx0213. > > If no one objects, I will use a new map for euc-jp in Encode-1.64 or > later and Encode::JIS2K is to follow. > > Dan the Encode Maintainer Regards, SADAHIRO Tomoyuki