On Mon, 29 Apr 2002 15:45:09 +0900
Dan Kogai <[EMAIL PROTECTED]> wrote:

> Sadahiro-san and perl-unicode readers,
> 
> I am now working on Encode::JIS2K, an additional converter for JIS X 
> 0213:2000.  When I studied JIS X 0213, I found that for euc-jp, you can 
> make a map so that it covers both JIS X 0212 and JIS X 0213.  I thought 
> they were mutually exclusive but they were not (there are some 
> duplicates, however.  So it was not as straightforward as aggregating 
> two maps).

Excellent.

I'd like to give some additional explanations. As shown below,
JIS X 0213:2000 plane 2 and JIS X 0212:1997
don't overlap on their KU-TEN (row-cell).
(Rows marked with * mean they bear Kanji [CJK ideographs].)
(Noteworthy, <the Non-Kanji part> of JIS X 0212:1997 also has
 no overlap with JIS X 0208:1997.)

JIS X 0208:1997
                cells defined
  row 1:        1..94.
  row 2:        1..14, 26..33, 42..48, 60..74, 82..89, 94.
  row 3:        16..25, 33..58, 65..90.
  row 4:        1..83.
  row 5:        1..86.
  row 6:        1..24, 33..56.
  row 7:        1..33, 49..81.
  row 8:        1..32.
 *rows 16..46:  1..94.
 *row 47:       1..51.
 *rows 48..83:  1..94.
 *row 64:       1..6.

JIS X 0212:1997
                cells defined
  row 2:        15..25, 34..36, 75..81.
  row 6:        65..69, 71, 73..74, 76, 81..92.
  row 7:        34..46, 82..94.
  row 9:        1..2, 4, 6, 8..9, 11..13, 15..16, 33..48.
  row 10:       1..24, 26..87.
  row 11:       1..27, 29..35, 37..87.
 *rows 16..76:  1..94.
 *row 77:       1..67.

JIS X 0213:2000 plane 2
                cells defined
 *row 1:        1..94.
 *rows 3..5:    1..94.
 *row 8:        1..94.
 *rows 12..15:  1..94.
 *rows 78..93:  1..94.
 *row 94:       1..86.

> I have just finished making new euc-jp.ucm that behaves like this;
> 
> for euc-jp,
> * Round-Trips for all JIS X 0201-kana, JIS X 0208 and JIS X 0212 (same 
> as before)
> * Decode-only for those that appear only in JIS X 0213

I doubt whether users of 'euc-jp' will
assume it to be a combination with JIS X 0213.

Such a mixing would prevent warning/croaking
for appearance of code points that are not defined
originally (meaning w/o X 0213), wouldn't it?

EUC-JP is not defined as including JIS X 0213,
and EUC-JISX0213 is not specified it includes JIS X 0212.
(exactly speaking, JIS does exclude JIS X 0201 kana from EUC-JISX0213.)

As the article 6.3, the explanation (`kaisetsu') of JIS X 0213:2000
mentioned, overlapping of JIS X 0212 and JIS X 0213 plane 2
has been avoided by design,
since they both should be used in G3 in the EUC scheme,
so that it should help to tell EUC-JP from EUC-JISX0213
and vice versa; but it should not intend to make the G3 set
a mixed bag with X 0213 p2 with X 0212.

IMO, if you must need provide a mixture of JIS X 0213 with JIS X 0212,
it should be better to be under another name
than EUC-JP nor EUC-JISX0213.

> Remind you that this new euc-jp.ucm is NOT THE SAME as euc-jp2k.ucm that 
> is to be included in Encode::JIS2K;
> 
> for euc-jisx0213,
> * Round-Trips for all JIS X 0201-kana and JIS X 0213 (both planes)
> * Decode-only for those that appear only in JIS X 0212
> * Those that conflict with JIS X 0208 and JIS X 0213-plane1, JIS X 0213 
> definition is used.   Only these 3 are different (so JIS X 0213-plane1 
> is ALMOST a superset of JIS X 0208).
> 
> euc-jp
> <UFFE3> \xA1\xB1 |0 # FULLWIDTH MACRON
> <U2015> \xA1\xBD |0 # HORIZONTAL BAR
> <UFFE5> \xA1\xEF |0 # FULLWIDTH YEN SIGN
> 
> euc-jisx0213
> <U203E> \xA1\xB1 |0 # OVERLINE
> <U2014> \xA1\xBD |0 # EM DASH
> <U00A5> \xA1\xEF |0 # YEN SIGN
> 
> In short, euc-jp and euc-jisx0213 differ only in encode() and decoders 
> can decode both euc-jp(1990) and euc-jisx0213.
> 
> If no one objects,  I will use a new map for euc-jp in Encode-1.64 or 
> later and Encode::JIS2K is to follow.
> 
> Dan the Encode Maintainer

Regards,
SADAHIRO Tomoyuki

Reply via email to