Re: converting Japanese chars into their Unicode values using 5.8's Encode

Nick Ing-Simmons Thu, 19 Sep 2002 05:10:49 -0700

Robert Allerstorfer <[EMAIL PROTECTED]> writes:
>Hello,
>
>I want to convert source code written in the Japanese shift_jis
>character set, into their Unicode numbers. For instance, "ŒŸ" should
>result in "U+691C" (which is 26908 in decimal). I tried using the
>Encode module of Perl 5.8 with something like this:
>
>        use Encode::JP;
>        my $string = "ŒŸ";
>        Encode::from_to($string, "shiftjis", "utf8");
>        my $ord = join("\n", unpack('U*', $string));
>        print "$string\n$ord";


from_to does what it says. In that case you took shiftjis decoded
it to Unicode then re-encoded as UTF-8 octets.

What you might have meant was to get Unicode rather than the re-encoded form: 

        use Encode::JP;
        my $string = "ŒŸ";
        Encode::from_to($string, "shiftjis", "Unicode");
        binmode STDOUT,':utf8';
        print length($string)," chars '$string'\n";
        my $ord = join("\n", map( ord($_),split(//,$string)));
        print "$ord";




>
>But, this gives a 3-character string "æ€œ" (with the decimal values
>230, 164 and 156). Could anyone please point me to the right direction
>on how to get the decimal number 26908 instead?
>
>Thanks in advance.
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Re: converting Japanese chars into their Unicode values using 5.8's Encode

Reply via email to