Bug#466341: Some ISO-2022-JP text cannot be roundtripped

2009-04-04 Thread Niko Tyni
found 466341 5.10.0-19
retitle 466341 support the Encode::decode CHECK argument with ISO-2022-JP
severity 466341 wishlist
thanks

On Mon, Feb 18, 2008 at 01:36:55AM -0500, Bryan Donlan wrote:
 Package: perl
 Version: 5.8.8-12
 Severity: normal
 
 Converting a certain sequence of ISO-2022-JP text to utf8 succeeds:
 $  perl -MEncode -e '$s= {\x1b\x24\x42\x2d)\x1b(B}; print
 encode(utf8, decode(iso-2022-jp, $s, Encode::FB_CROAK)), \n'
 {⑨}
 
 However, converting it back to ISO-2022-JP fails:
 $ perl -MEncode -e '$s= {\x1b\x24\x42\x2d)\x1b(B}; print
 encode(iso-2022-jp, decode(iso-2022-jp, $s, Encode::FB_CROAK)),
 \n'
 {\x{2468}}
 
 It should be noted that iconv rejects this entirely:
 $ perl -MEncode -e '$s= {\x1b\x24\x42\x2d)\x1b(B}; print $s,
 \n'|iconv -f iso-2022-jp -t utf8
 {iconv: illegal input sequence at position 4
 
 However, if this is truly invalid iso-2022-jp, perl should croak on it, since
 FB_CROAK was passed.

It's indeed an invalid sequence, iconv is right about that. The original
JIS-C-6226 (aka. JIS X 0208) standard can be found at e.g. [1], and it
does not contain 0x2d 0x29, which is the sequence embedded in your 
iso-2022-jp coded example.

The bug here seems to be that the corresponding Encode module ignores
the CHECK argument. The Encode documentation states:

 NOTE: Not all encoding support this feature
  Some encodings ignore CHECK argument.  For example, Encode::Unicode ignores 
CHECK and it
  always croaks on error.

so lowering the severity.

[1] http://www.itscj.ipsj.or.jp/ISO-IR/087.pdf

Cheers,
-- 
Niko Tyni   nt...@debian.org



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#466341: Some ISO-2022-JP text cannot be roundtripped

2008-02-17 Thread Bryan Donlan
Package: perl
Version: 5.8.8-12
Severity: normal

Converting a certain sequence of ISO-2022-JP text to utf8 succeeds:
$  perl -MEncode -e '$s= {\x1b\x24\x42\x2d)\x1b(B}; print
encode(utf8, decode(iso-2022-jp, $s, Encode::FB_CROAK)), \n'
{⑨}

However, converting it back to ISO-2022-JP fails:
$ perl -MEncode -e '$s= {\x1b\x24\x42\x2d)\x1b(B}; print
encode(iso-2022-jp, decode(iso-2022-jp, $s, Encode::FB_CROAK)),
\n'
{\x{2468}}

It should be noted that iconv rejects this entirely:
$ perl -MEncode -e '$s= {\x1b\x24\x42\x2d)\x1b(B}; print $s,
\n'|iconv -f iso-2022-jp -t utf8
{iconv: illegal input sequence at position 4

However, if this is truly invalid iso-2022-jp, perl should croak on it, since
FB_CROAK was passed.

-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.18.8-domU-linode7 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages perl depends on:
ii  libc6 2.7-6  GNU C Library: Shared libraries
ii  libdb4.6  4.6.21-5   Berkeley v4.6 Database Libraries [
ii  libgdbm3  1.8.3-3GNU dbm database routines (runtime
ii  perl-base 5.8.8-12   The Pathologically Eclectic Rubbis
ii  perl-modules  5.8.8-12   Core Perl modules

Versions of packages perl recommends:
ii  perl-doc  5.8.8-12   Perl documentation

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]