Re: Japanese chars and ARGV

Eric Amick Sat, 02 Dec 2006 13:47:57 -0800

On Sat, 02 Dec 2006 12:00:12 -0800, you wrote:

>On a Japanese version of Windows when you execute a Perl to run a script, the 
>length() fcn returns
>the wrong number of characters for anything you pass in as @ARGV[0], and the 
>split() fcn seems to
>work the same way.
>
>Using some of the samples shows in perluniintro we do not get the same 
>results, so something is wrong.
>
>Using ActivePerl 5.8.8 Build 819. Using Win2003 Server, Japanese. No 
>emulation, all default Japanese
>installation.
>
>Here is what we are doing:
>
>perl script.pl &#12486;&#12473;&#12488;
>
>(there are three characters for @ARGV[0], the Japanese word for 'test')
>
>The perl script does this:
>
>print length(@ARGV[0]);  # returns 6
>
>If one tries to use split(\\, @ARGV[0]) there are 6 iterations.
>
>Tried use encoding UTF8, the -C6 flag and a ton of other stuff.
>Oddly, if one does 'print @ARGV[0]' the output is &#12486;&#12473;&#12488;.
>
>Even used something from perluniintro:
>$Unicode_string = pack("U*", unpack("W*", $ARGV[0]));
>print $Unicode_string         # returns &#12486;&#12473;&#12488;
>print length($Unicode_string) # returns 6
>
>We need to capture each character in &#12486;&#12473;&#12488; (3 of them) and 
>get the HEX or UNICODE value for the
>character. Since Perl thinks the length is 6 we cannot get correct hex/unicode 
>values using
>pack/unpack or anything else for that matter.


I may be missing something, but wouldn't -CA or -C32 do what you want?
According to perlrun, it means "the elements of @ARGV are strings
encoded in UTF-8".
-- 
Eric Amick
Columbia, MD
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Japanese chars and ARGV

Reply via email to