#752: Parrot concatenates iso-8859-1 and utf8 incorrectly
----------------------+-----------------------------------------------------
 Reporter:  pmichaud  |       Owner:       
     Type:  bug       |      Status:  new  
 Priority:  normal    |   Milestone:       
Component:  core      |     Version:  1.2.0
 Severity:  high      |    Keywords:       
     Lang:  perl6     |       Patch:       
 Platform:            |  
----------------------+-----------------------------------------------------
 Parrot has difficulty concatenating iso-8859-1 and utf8 strings.  Here's
 the test case:

 {{{
 $ cat x.pir
 .sub 'main'
     $S0 = unicode:"\u00e5\u263b"

     $S1 = chr 0xe5
     $S2 = chr 0x263b
     $S3 = concat $S1, $S2

     if $S0 == $S3 goto equal
     print "not "
   equal:
     say "equal"
 .end
 $ ./parrot x.pir
 Malformed UTF-8 string

 current instr.: 'main' pc 13 (x.pir:7)
 $
 }}}

 Note that the exception occurs at the point of the == comparison, not when
 the concatenation occurs.  If one outputs the value of $S3, it comes out
 as four bytes (e5 e2 98 bb).  The correct result should be five bytes (c3
 a5 e2 98 bb) -- i.e., the iso-8859-1 string that comes back from chr(229)
 needs to be converted to utf8 before concatenation.

 This looks very similar to the bug reported in RT #39930 (which has since
 been marked as fixed, but apparently doesn't fix this case).

 A fix for this is needed for various modules in Rakudo--especially those
 dealing with url encoding and decoding.

 Thanks!

 Pm

-- 
Ticket URL: <https://trac.parrot.org/parrot/ticket/752>
Parrot <https://trac.parrot.org/parrot/>
Parrot Development
_______________________________________________
parrot-tickets mailing list
[email protected]
http://lists.parrot.org/mailman/listinfo/parrot-tickets

Reply via email to