Robert Allerstorfer <[EMAIL PROTECTED]> writes: >Hello, > >I want to convert source code written in the Japanese shift_jis >character set, into their Unicode numbers. For instance, "ŒŸ" should >result in "U+691C" (which is 26908 in decimal). I tried using the >Encode module of Perl 5.8 with something like this: > > use Encode::JP; > my $string = "ŒŸ"; > Encode::from_to($string, "shiftjis", "utf8"); > my $ord = join("\n", unpack('U*', $string)); > print "$string\n$ord";
from_to does what it says. In that case you took shiftjis decoded it to Unicode then re-encoded as UTF-8 octets. What you might have meant was to get Unicode rather than the re-encoded form: use Encode::JP; my $string = "ŒŸ"; Encode::from_to($string, "shiftjis", "Unicode"); binmode STDOUT,':utf8'; print length($string)," chars '$string'\n"; my $ord = join("\n", map( ord($_),split(//,$string))); print "$ord"; > >But, this gives a 3-character string "怜" (with the decimal values >230, 164 and 156). Could anyone please point me to the right direction >on how to get the decimal number 26908 instead? > >Thanks in advance. -- Nick Ing-Simmons http://www.ni-s.u-net.com/