Re: iso-2022-jp encoding on EBCDIC

rajarshi das Wed, 21 Dec 2005 05:57:58 -0800

Created ticket # 16663.

SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:

On Mon, 19 Dec 2005 22:28:55 -0800 (PST), rajarshi das <[EMAIL PROTECTED]>wrote

> I am testing this with iso-2022-jp encoding :
> ------------------------
> use encoding 'iso-2022-jp';
>
> $a = "^[$B$!^[(B";
> print "a : $a\n";
> ------------------------
>
> On linux, I get :
> a : ^[^[(B
> /* Why is the '(B' shown? Isnt this just an escape
> char to switch over to ASCII ? */

In a double-quote string, $B and $! are interpolated
as a variable;

that is $a = '^[' . $B . $! . '^[(B'; in other words,
a concatenation of literal ^[ and variable $B and variable $!
and literal ^[(B

And ^[ is CIRCUMFLEX ACCENT + LEFT SQUARE BRACKET
but not a control character ESCAPE.

> On ebcdic, I get :
> Malformed UTF-8 character (unexpected end of string)
> at /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl
> line 330.
> Malformed UTF-8 character (unexpected continuation
> byte 0x6a, with no preceding start byte) in pattern
> match (m//) at
> /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line
> 337.
> Malformed UTF-8 character (unexpected continuation
> byte 0x6a, with no preceding start byte) in pattern
> match (m//) at
> /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line
> 337.
>
> -- and some junk data.
>
> Seems like in "$B$!^[(B" above, $! and ^[ are
> incorrect two byte sequences on ebcdic. However, $!
> donot translate into printable characters on cp-1047 .
> What do we replace them by ?

Accoding to JIS X 0208:1997 Appendix 2 (that specifies ISO-2022-JP),
escape sequences for ISO 2022-JP is "\x1B\x28\x42", "\x1B\x28\x4A",
"\x1B\x24\x40", "\x1B\x24\x42".

ASCII graphic representations such as "\e$B" are not portable
to EBCDIC, nevertheless they are widely used in the ASCII world.

In EBCDIC, ESCAPE "\e" is not \x1B but \x27, DOLLAR $ is not \x24
but \x5B, CAPITAL B is not \x42 but \xC2. Don't replace escape
sequences with corresponding graphic characters as ASCII.

If I understand it correctly, an escape sequence is a sequence of
7-bit or 8-bit combinations, but not a sequence of graphic characters;
an escape sequence is encoded neither in ASCII nor in EBCDIC.
(Though I refer to JIS X 0202, standard Japanese translation,
instead of the original ISO/IEC 2022.)

> I tested again with :
> ---------------------------------
> use encoding 'iso-2022-jp';
> $a = "$B&&(B"; # && is \x50\x50 on EBCDIC which is
> valid acc to jis0208.ucm
> print "a : $a\n";
> ----------------------------------
>
> But I still get the messages as above and some junk
> data in $a which I dont think is the correct o/p.

As Encode.pm is a CPAN module, perhaps bugs in it should be
reported to the maintainer of the module, rather than
the perl5-porters mailing list.

The site rt.cpan.org helps to report bugs in every distribution
released through CPAN:

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode

Regards,
SADAHIRO Tomoyuki

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: iso-2022-jp encoding on EBCDIC

Reply via email to