Created ticket # 16663.
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> wrote:
On Mon, 19 Dec 2005 22:28:55 -0800 (PST), rajarshi das <[EMAIL PROTECTED]>wrote
> I am testing this with iso-2022-jp encoding :
> ------------------------
> use encoding 'iso-2022-jp';
>
> $a = "^[$B$!^[(B";
> print "a : $a\n";
> ------------------------
>
> On linux, I get :
> a : ^[^[(B
> /* Why is the '(B' shown? Isnt this just an escape
> char to switch over to ASCII ? */
In a double-quote string, $B and $! are interpolated
as a variable;
that is $a = '^[' . $B . $! . '^[(B'; in other words,
a concatenation of literal ^[ and variable $B and variable $!
and literal ^[(B
And ^[ is CIRCUMFLEX ACCENT + LEFT SQUARE BRACKET
but not a control character ESCAPE.
> On ebcdic, I get :
> Malformed UTF-8 character (unexpected end of string)
> at /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl
> line 330.
> Malformed UTF-8 character (unexpected continuation
> byte 0x6a, with no preceding start byte) in pattern
> match (m//) at
> /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line
> 337.
> Malformed UTF-8 character (unexpected continuation
> byte 0x6a, with no preceding start byte) in pattern
> match (m//) at
> /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line
> 337.
>
> -- and some junk data.
>
> Seems like in "$B$!^[(B" above, $! and ^[ are
> incorrect two byte sequences on ebcdic. However, $!
> donot translate into printable characters on cp-1047 .
> What do we replace them by ?
Accoding to JIS X 0208:1997 Appendix 2 (that specifies ISO-2022-JP),
escape sequences for ISO 2022-JP is "\x1B\x28\x42", "\x1B\x28\x4A",
"\x1B\x24\x40", "\x1B\x24\x42".
ASCII graphic representations such as "\e$B" are not portable
to EBCDIC, nevertheless they are widely used in the ASCII world.
In EBCDIC, ESCAPE "\e" is not \x1B but \x27, DOLLAR $ is not \x24
but \x5B, CAPITAL B is not \x42 but \xC2. Don't replace escape
sequences with corresponding graphic characters as ASCII.
If I understand it correctly, an escape sequence is a sequence of
7-bit or 8-bit combinations, but not a sequence of graphic characters;
an escape sequence is encoded neither in ASCII nor in EBCDIC.
(Though I refer to JIS X 0202, standard Japanese translation,
instead of the original ISO/IEC 2022.)
> I tested again with :
> ---------------------------------
> use encoding 'iso-2022-jp';
> $a = "$B&&(B"; # && is \x50\x50 on EBCDIC which is
> valid acc to jis0208.ucm
> print "a : $a\n";
> ----------------------------------
>
> But I still get the messages as above and some junk
> data in $a which I dont think is the correct o/p.
As Encode.pm is a CPAN module, perhaps bugs in it should be
reported to the maintainer of the module, rather than
the perl5-porters mailing list.
The site rt.cpan.org helps to report bugs in every distribution
released through CPAN:
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode
Regards,
SADAHIRO Tomoyuki
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com