On Mon, 3 Oct 2005 07:13:15 -0700 (PDT), rajarshi das <[EMAIL PROTECTED]> wrote
> Hi, > The following unicode folding test fails on EBCDIC > (perl-5.8.6) : > > $a = '0178'; > $b = '00FF'; > > $a1 = pack("U0U*", hex $code); > $b1 = pack("U0U*", map { hex } split " ", $mapping); > > if (":$b1:" =~ /:[$a1]:/i) { > print "ok\n"; > } I guess $code is $a and $mapping is $b... > Alternately, if $a = '0178', and $b = '00DF', the test > passes. > > Why is this so ? > Is it because \xFF as a border case ( 1 less than 256) > is not properly handled ? 0xDF in IBM 1047 or some other EBCDIC encodings is ÿ (that is y with diaeresis) which corresponds U+00FF and its uppercase is U+0178. How about $a = '039C' and $b = '00A0' or '00B5'? Here 0xA0 in IBM 1047 is µ (that is MICRO SIGN) which corresponds U+00B5 and its uppercase is U+039C. > Does someone have any thoughts on the source of the > problem ? Possibly a Unicode code value and a native code value may be confused. If the native encoding is EBCDIC, it causes much trouble compared with the case of ASCII/latin-1. Or is the value stored in $b1 generated by pack("U0U*", map { hex } split " ", '00FF') really a representation of U+00FF? use Devel::Peek and what is output from Devel::Peek::Dump($b1)? ## example of usage of Devel::Peek ## use Devel::Peek; $b1 = pack("U0U*", map { hex } split " ", '00FF'); Dump($b1); ## example of output from Devel::Peek::Dump ## SV = PV(0x36572c) at 0x182c96c REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x36d9b4 "\303\277"\0 [UTF8 "\x{ff}"] CUR = 2 LEN = 4 where PV stands for string and "\303\277" is U+00FF in UTF-8. In UTF-EBCDIC, the output should be different. regards, SADAHIRO Tomoyuki