On Sat, 02 Dec 2006 12:00:12 -0800, you wrote: >On a Japanese version of Windows when you execute a Perl to run a script, the >length() fcn returns >the wrong number of characters for anything you pass in as @ARGV[0], and the >split() fcn seems to >work the same way. > >Using some of the samples shows in perluniintro we do not get the same >results, so something is wrong. > >Using ActivePerl 5.8.8 Build 819. Using Win2003 Server, Japanese. No >emulation, all default Japanese >installation. > >Here is what we are doing: > >perl script.pl テスト > >(there are three characters for @ARGV[0], the Japanese word for 'test') > >The perl script does this: > >print length(@ARGV[0]); # returns 6 > >If one tries to use split(\\, @ARGV[0]) there are 6 iterations. > >Tried use encoding UTF8, the -C6 flag and a ton of other stuff. >Oddly, if one does 'print @ARGV[0]' the output is テスト. > >Even used something from perluniintro: >$Unicode_string = pack("U*", unpack("W*", $ARGV[0])); >print $Unicode_string # returns テスト >print length($Unicode_string) # returns 6 > >We need to capture each character in テスト (3 of them) and >get the HEX or UNICODE value for the >character. Since Perl thinks the length is 6 we cannot get correct hex/unicode >values using >pack/unpack or anything else for that matter.
I may be missing something, but wouldn't -CA or -C32 do what you want? According to perlrun, it means "the elements of @ARGV are strings encoded in UTF-8". -- Eric Amick Columbia, MD _______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs