#752: Parrot concatenates iso-8859-1 and utf8 incorrectly
----------------------+-----------------------------------------------------
Reporter: pmichaud | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: core | Version: 1.2.0
Severity: high | Keywords:
Lang: perl6 | Patch:
Platform: |
----------------------+-----------------------------------------------------
Parrot has difficulty concatenating iso-8859-1 and utf8 strings. Here's
the test case:
{{{
$ cat x.pir
.sub 'main'
$S0 = unicode:"\u00e5\u263b"
$S1 = chr 0xe5
$S2 = chr 0x263b
$S3 = concat $S1, $S2
if $S0 == $S3 goto equal
print "not "
equal:
say "equal"
.end
$ ./parrot x.pir
Malformed UTF-8 string
current instr.: 'main' pc 13 (x.pir:7)
$
}}}
Note that the exception occurs at the point of the == comparison, not when
the concatenation occurs. If one outputs the value of $S3, it comes out
as four bytes (e5 e2 98 bb). The correct result should be five bytes (c3
a5 e2 98 bb) -- i.e., the iso-8859-1 string that comes back from chr(229)
needs to be converted to utf8 before concatenation.
This looks very similar to the bug reported in RT #39930 (which has since
been marked as fixed, but apparently doesn't fix this case).
A fix for this is needed for various modules in Rakudo--especially those
dealing with url encoding and decoding.
Thanks!
Pm
--
Ticket URL: <https://trac.parrot.org/parrot/ticket/752>
Parrot <https://trac.parrot.org/parrot/>
Parrot Development
_______________________________________________
parrot-tickets mailing list
[email protected]
http://lists.parrot.org/mailman/listinfo/parrot-tickets