On Thu, 10 Jan 2002 19:50:10 +0900
Dan Kogai <[EMAIL PROTECTED]> wrote:

> Bad news.  It's gotten worse on the latest DEVEL14150.  It completely 
> ignores 2byte chars.  Here is the detailed research.
> 
> I used MacOS 10.1.2 for 5.7.2 and FreeBSD 4.5-stable for DEVEL14150 
> (5.7.2 didn't just compile on FreeBSD; I think it's a know fact).
> 
> # first let's see if conventional method works
> perl -MJcode -ple '$_=jcode($_,'euc')->utf8' table.euc > table.utf8
> # table.euc is a euc-jp encoded text that contains all ascii, JISX0201
> # (aka Hankaku Kana) and JISX0208


> Now comes Encode module of 5.7.2
> 
> # see the previous mail for classic.pl
> ../classic.pl -d table.euc camel572.utf8
> ../classic.pl -e table.utf8 camel572.euc
> 
> Voila!  diff -u table.utf8 camel572.utf8 gives me an empty string!  They 
> are completely identical.  Bad news is that encoding back to euc is the 
> trash.  Half way it would be it worked.

> Now  DEVEL14150.  Decode worked fine like 5.7.2 but when you try to 
> encode from utf8 to euc-jp,  perl croaks with;
> 
> euc-jp '[non-printable garbage]' does not map to UTF-8 at 
> /home/dankogai/perl/lib/5.7.2/i386-freebsd-multi-64int/Encode/Tcl.pm 
> line 228

I guess in that string SVf_UTF8 would be off.
This should be due to not using the UTF-8 layer.
(But "euc-jp .. does not map to UTF-8 " error message
 must be shown on decoding to unicode.)

Please refer to the PerlIO manpage for detail;
we'd declair the stream takes unicode sequence
like this: binmode(FILEHANDLE, ":utf8");
or through open() function.

Bleadperl has * many many * docs on Unicode...
perluniiintro, perlunicode, lib/utf8, etc.

I'd be glad if this would help you,

http://homepage1.nifty.com/nomenclator/perl/unicode.htm
(in Japanese)

there is a brief on Perl's Unicode support including
a bit of comparison and differences
between that of Perl 5.7 and 5.6.

> Now I am tempted to implement toplevel Encode myself....
> 
> Also, 5.7.2 and its variants appear pretty unstable.  Let me see if 
> Encode itself can work on 5.6.1 as well (should be, it's under ext/ 
> directory after all.  A little tweak on compile scripte would be needed, 
> however).

> Dan the Man with Too Many Charsets to Handle

Encode::Tcl should work on Perl 5.6 as it is pure-perl,
however it's very slow, as you pointed it out,
and therefore not very practical to use.
There is much room for improvement.

Regards,
SADAHIRO Tomoyuki
URL: http://homepage1.nifty.com/nomenclator/

Reply via email to