On Mon, 29 Apr 2002 15:45:09 +0900
Dan Kogai <[EMAIL PROTECTED]> wrote:
> Sadahiro-san and perl-unicode readers,
>
> I am now working on Encode::JIS2K, an additional converter for JIS X
> 0213:2000. When I studied JIS X 0213, I found that for euc-jp, you can
> make a map so that it covers both JIS X 0212 and JIS X 0213. I thought
> they were mutually exclusive but they were not (there are some
> duplicates, however. So it was not as straightforward as aggregating
> two maps).
Excellent.
I'd like to give some additional explanations. As shown below,
JIS X 0213:2000 plane 2 and JIS X 0212:1997
don't overlap on their KU-TEN (row-cell).
(Rows marked with * mean they bear Kanji [CJK ideographs].)
(Noteworthy, <the Non-Kanji part> of JIS X 0212:1997 also has
no overlap with JIS X 0208:1997.)
JIS X 0208:1997
cells defined
row 1: 1..94.
row 2: 1..14, 26..33, 42..48, 60..74, 82..89, 94.
row 3: 16..25, 33..58, 65..90.
row 4: 1..83.
row 5: 1..86.
row 6: 1..24, 33..56.
row 7: 1..33, 49..81.
row 8: 1..32.
*rows 16..46: 1..94.
*row 47: 1..51.
*rows 48..83: 1..94.
*row 64: 1..6.
JIS X 0212:1997
cells defined
row 2: 15..25, 34..36, 75..81.
row 6: 65..69, 71, 73..74, 76, 81..92.
row 7: 34..46, 82..94.
row 9: 1..2, 4, 6, 8..9, 11..13, 15..16, 33..48.
row 10: 1..24, 26..87.
row 11: 1..27, 29..35, 37..87.
*rows 16..76: 1..94.
*row 77: 1..67.
JIS X 0213:2000 plane 2
cells defined
*row 1: 1..94.
*rows 3..5: 1..94.
*row 8: 1..94.
*rows 12..15: 1..94.
*rows 78..93: 1..94.
*row 94: 1..86.
> I have just finished making new euc-jp.ucm that behaves like this;
>
> for euc-jp,
> * Round-Trips for all JIS X 0201-kana, JIS X 0208 and JIS X 0212 (same
> as before)
> * Decode-only for those that appear only in JIS X 0213
I doubt whether users of 'euc-jp' will
assume it to be a combination with JIS X 0213.
Such a mixing would prevent warning/croaking
for appearance of code points that are not defined
originally (meaning w/o X 0213), wouldn't it?
EUC-JP is not defined as including JIS X 0213,
and EUC-JISX0213 is not specified it includes JIS X 0212.
(exactly speaking, JIS does exclude JIS X 0201 kana from EUC-JISX0213.)
As the article 6.3, the explanation (`kaisetsu') of JIS X 0213:2000
mentioned, overlapping of JIS X 0212 and JIS X 0213 plane 2
has been avoided by design,
since they both should be used in G3 in the EUC scheme,
so that it should help to tell EUC-JP from EUC-JISX0213
and vice versa; but it should not intend to make the G3 set
a mixed bag with X 0213 p2 with X 0212.
IMO, if you must need provide a mixture of JIS X 0213 with JIS X 0212,
it should be better to be under another name
than EUC-JP nor EUC-JISX0213.
> Remind you that this new euc-jp.ucm is NOT THE SAME as euc-jp2k.ucm that
> is to be included in Encode::JIS2K;
>
> for euc-jisx0213,
> * Round-Trips for all JIS X 0201-kana and JIS X 0213 (both planes)
> * Decode-only for those that appear only in JIS X 0212
> * Those that conflict with JIS X 0208 and JIS X 0213-plane1, JIS X 0213
> definition is used. Only these 3 are different (so JIS X 0213-plane1
> is ALMOST a superset of JIS X 0208).
>
> euc-jp
> <UFFE3> \xA1\xB1 |0 # FULLWIDTH MACRON
> <U2015> \xA1\xBD |0 # HORIZONTAL BAR
> <UFFE5> \xA1\xEF |0 # FULLWIDTH YEN SIGN
>
> euc-jisx0213
> <U203E> \xA1\xB1 |0 # OVERLINE
> <U2014> \xA1\xBD |0 # EM DASH
> <U00A5> \xA1\xEF |0 # YEN SIGN
>
> In short, euc-jp and euc-jisx0213 differ only in encode() and decoders
> can decode both euc-jp(1990) and euc-jisx0213.
>
> If no one objects, I will use a new map for euc-jp in Encode-1.64 or
> later and Encode::JIS2K is to follow.
>
> Dan the Encode Maintainer
Regards,
SADAHIRO Tomoyuki