Robert Allerstorfer <[EMAIL PROTECTED]> writes:
>Hello,
>
>I want to convert source code written in the Japanese shift_jis
>character set, into their Unicode numbers. For instance, "ŒŸ" should
>result in "U+691C" (which is 26908 in decimal). I tried using the
>Encode module of Perl 5.8 with something like this:
>
> use Encode::JP;
> my $string = "ŒŸ";
> Encode::from_to($string, "shiftjis", "utf8");
> my $ord = join("\n", unpack('U*', $string));
> print "$string\n$ord";
from_to does what it says. In that case you took shiftjis decoded
it to Unicode then re-encoded as UTF-8 octets.
What you might have meant was to get Unicode rather than the re-encoded form:
use Encode::JP;
my $string = "ŒŸ";
Encode::from_to($string, "shiftjis", "Unicode");
binmode STDOUT,':utf8';
print length($string)," chars '$string'\n";
my $ord = join("\n", map( ord($_),split(//,$string)));
print "$ord";
>
>But, this gives a 3-character string "怜" (with the decimal values
>230, 164 and 156). Could anyone please point me to the right direction
>on how to get the decimal number 26908 instead?
>
>Thanks in advance.
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/