Autrijus Tang <[EMAIL PROTECTED]> writes:
>Also, Encode.pm seems unable to handle '00xy' in the map, where 'x' has its
>highest bit set. There are six such places:
>
>Big5 UCS2 Charname
>-----------------------------
>A150 00B7 MIDDLE DOT
>A1B1 00A7 SECTION SIGN
>A1D1 00D7 MULTIPLICATION SIGN
>A1D2 00F7 DIVISION SIGN
>A1D3 00B1 PLUS-MINUS SIGN
>A258 00B0 DEGREE SIGN
>
>For example, decode('big5', "\xA1\x50") simply equals to "\xB7", instead
>of the required "\xC2\xB7" UTF-8 expansion form. Can this be fixed?

What you see in perl is the Unicode code point number _NOT_ the UTF-8
encoding. If you want UTF-8 octet sequence you need to encode('UTF-8',...)
(or one of the short cuts for that).

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/



Reply via email to