[perl #60396] [BUG] escape opcode returns incorrect result
# New Ticket Created by Patrick R. Michaud # Please include the string: [perl #60396] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=60396 There's a bug somewhere in the escape opcode (r32442, no libicu present). Here's the test case: $ cat y.pir .sub main $S0 = unicode:x/\u0445\u0440\u0435\u043d\u044c_09-10.txt say $S0 $S1 = escape $S0 say $S1 .end $ ./parrot y.pir x/хрень_09-10.txt x/\u0445\u0440\u0435\u043d\u044c9-10.txt We start by constructing a unicode string (originally from RT #58820) and displaying it, then we escape the string and display that. The escaped version should be the same as what appears in the quotes in the unicode:... literal, but as you can see above the _0 characters present in the original string are lost in the escaped version. A hex dump shows that they are being turned into NUL bytes somehow: $ ./parrot y.pir | xxd 000: 782f d185 d180 d0b5 d0bd d18c 5f30 392d x/.._09- 010: 3130 2e74 7874 0a78 2f5c 7530 3434 355c 10.txt.x/\u0445\ 020: 7530 3434 305c 7530 3433 355c 7530 3433 u0440\u0435\u043 030: 645c 7530 3434 6300 0039 2d31 302e 7478 d\u044c..9-10.tx 040: 740a t. $ This bug appears to be very sensitive to the contents of this paritcular string -- adding, removing, or otherwise changing the string contents causes the bug to disappear. Pm
Re: [perl #60396] [BUG] escape opcode returns incorrect result
On Friday 07 November 2008 14:56:34 Patrick R. Michaud (via RT) wrote: There's a bug somewhere in the escape opcode (r32442, no libicu present). Here's the test case: $ cat y.pir .sub main $S0 = unicode:x/\u0445\u0440\u0435\u043d\u044c_09-10.txt say $S0 $S1 = escape $S0 say $S1 .end $ ./parrot y.pir x/хрень_09-10.txt x/\u0445\u0440\u0435\u043d\u044c9-10.txt We start by constructing a unicode string (originally from RT #58820) and displaying it, then we escape the string and display that. The escaped version should be the same as what appears in the quotes in the unicode:... literal, but as you can see above the _0 characters present in the original string are lost in the escaped version. A hex dump shows that they are being turned into NUL bytes somehow: $ ./parrot y.pir | xxd 000: 782f d185 d180 d0b5 d0bd d18c 5f30 392d x/.._09- 010: 3130 2e74 7874 0a78 2f5c 7530 3434 355c 10.txt.x/\u0445\ 020: 7530 3434 305c 7530 3433 355c 7530 3433 u0440\u0435\u043 030: 645c 7530 3434 6300 0039 2d31 302e 7478 d\u044c..9-10.tx 040: 740a t. $ This bug appears to be very sensitive to the contents of this paritcular string -- adding, removing, or otherwise changing the string contents causes the bug to disappear. Fixed in r32444, with your code turned into a test. Thanks! (This also fixes RT #58820). -- c