# New Ticket Created by  Zefram 
# Please include the string:  [perl #128512]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=128512 >


A decode-then-encode cycle through the utf8-c8 encoding is meant to
round-trip an octet string.  But if the input is the UTF-8 encoding
of a string that's not NFC normalised, the output ends up different,
because this normalisation got performed somewhere in the middle:

> Blob[uint8].new(101, 204, 129).decode("utf8-c8").encode("utf8-c8").perl
Blob[uint8].new(195,169)
> Blob[uint8].new(195, 169).decode("utf8-c8").encode("utf8-c8").perl
Blob[uint8].new(195,169)

This is of particular concern for things like access to command-line
arguments:

$ perl6 -e 'say @*ARGS[0].encode("utf8-c8")' $'e\xcc\x81'
Blob[uint8]:0x<c3 a9>
$ perl6 -e 'say @*ARGS[0].encode("utf8-c8")' $'\xc3\xa9'
Blob[uint8]:0x<c3 a9>

-zefram

Reply via email to