On Thu, 10 Jan 2002 19:50:10 +0900 Dan Kogai <[EMAIL PROTECTED]> wrote:
> Bad news. It's gotten worse on the latest DEVEL14150. It completely > ignores 2byte chars. Here is the detailed research. > > I used MacOS 10.1.2 for 5.7.2 and FreeBSD 4.5-stable for DEVEL14150 > (5.7.2 didn't just compile on FreeBSD; I think it's a know fact). > > # first let's see if conventional method works > perl -MJcode -ple '$_=jcode($_,'euc')->utf8' table.euc > table.utf8 > # table.euc is a euc-jp encoded text that contains all ascii, JISX0201 > # (aka Hankaku Kana) and JISX0208 > Now comes Encode module of 5.7.2 > > # see the previous mail for classic.pl > ../classic.pl -d table.euc camel572.utf8 > ../classic.pl -e table.utf8 camel572.euc > > Voila! diff -u table.utf8 camel572.utf8 gives me an empty string! They > are completely identical. Bad news is that encoding back to euc is the > trash. Half way it would be it worked. > Now DEVEL14150. Decode worked fine like 5.7.2 but when you try to > encode from utf8 to euc-jp, perl croaks with; > > euc-jp '[non-printable garbage]' does not map to UTF-8 at > /home/dankogai/perl/lib/5.7.2/i386-freebsd-multi-64int/Encode/Tcl.pm > line 228 I guess in that string SVf_UTF8 would be off. This should be due to not using the UTF-8 layer. (But "euc-jp .. does not map to UTF-8 " error message must be shown on decoding to unicode.) Please refer to the PerlIO manpage for detail; we'd declair the stream takes unicode sequence like this: binmode(FILEHANDLE, ":utf8"); or through open() function. Bleadperl has * many many * docs on Unicode... perluniiintro, perlunicode, lib/utf8, etc. I'd be glad if this would help you, http://homepage1.nifty.com/nomenclator/perl/unicode.htm (in Japanese) there is a brief on Perl's Unicode support including a bit of comparison and differences between that of Perl 5.7 and 5.6. > Now I am tempted to implement toplevel Encode myself.... > > Also, 5.7.2 and its variants appear pretty unstable. Let me see if > Encode itself can work on 5.6.1 as well (should be, it's under ext/ > directory after all. A little tweak on compile scripte would be needed, > however). > Dan the Man with Too Many Charsets to Handle Encode::Tcl should work on Perl 5.6 as it is pure-perl, however it's very slow, as you pointed it out, and therefore not very practical to use. There is much room for improvement. Regards, SADAHIRO Tomoyuki URL: http://homepage1.nifty.com/nomenclator/