Autrijus Tang <[EMAIL PROTECTED]> writes: >Also, Encode.pm seems unable to handle '00xy' in the map, where 'x' has its >highest bit set. There are six such places: > >Big5 UCS2 Charname >----------------------------- >A150 00B7 MIDDLE DOT >A1B1 00A7 SECTION SIGN >A1D1 00D7 MULTIPLICATION SIGN >A1D2 00F7 DIVISION SIGN >A1D3 00B1 PLUS-MINUS SIGN >A258 00B0 DEGREE SIGN > >For example, decode('big5', "\xA1\x50") simply equals to "\xB7", instead >of the required "\xC2\xB7" UTF-8 expansion form. Can this be fixed?
What you see in perl is the Unicode code point number _NOT_ the UTF-8 encoding. If you want UTF-8 octet sequence you need to encode('UTF-8',...) (or one of the short cuts for that). -- Nick Ing-Simmons http://www.ni-s.u-net.com/