[perl #60396] [BUG] escape opcode returns incorrect result

2008-11-07 Thread Patrick R. Michaud (via RT)
# New Ticket Created by  Patrick R. Michaud 
# Please include the string:  [perl #60396]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=60396 


There's a bug somewhere in the escape opcode 
(r32442, no libicu present).  Here's the test case:

  $ cat y.pir
  .sub main
  $S0 = unicode:x/\u0445\u0440\u0435\u043d\u044c_09-10.txt
  say $S0
  $S1 = escape $S0
  say $S1
  .end
  
  $ ./parrot y.pir
  x/хрень_09-10.txt
  x/\u0445\u0440\u0435\u043d\u044c9-10.txt

We start by constructing a unicode string (originally from RT #58820)
and displaying it, then we escape the string and display that.
The escaped version should be the same as what appears in the
quotes in the unicode:... literal, but as you can see above
the _0 characters present in the original string are lost in
the escaped version.  A hex dump shows that they are being turned 
into NUL bytes somehow:

  $ ./parrot y.pir | xxd
  000: 782f d185 d180 d0b5 d0bd d18c 5f30 392d  x/.._09-
  010: 3130 2e74 7874 0a78 2f5c 7530 3434 355c  10.txt.x/\u0445\
  020: 7530 3434 305c 7530 3433 355c 7530 3433  u0440\u0435\u043
  030: 645c 7530 3434 6300 0039 2d31 302e 7478  d\u044c..9-10.tx
  040: 740a t.
  $

This bug appears to be very sensitive to the contents of this
paritcular string -- adding, removing, or otherwise changing the 
string contents causes the bug to disappear.

Pm


Re: [perl #60396] [BUG] escape opcode returns incorrect result

2008-11-07 Thread chromatic
On Friday 07 November 2008 14:56:34 Patrick R. Michaud (via RT) wrote:

 There's a bug somewhere in the escape opcode
 (r32442, no libicu present).  Here's the test case:

   $ cat y.pir
   .sub main
   $S0 = unicode:x/\u0445\u0440\u0435\u043d\u044c_09-10.txt
   say $S0
   $S1 = escape $S0
   say $S1
   .end

   $ ./parrot y.pir
   x/хрень_09-10.txt
   x/\u0445\u0440\u0435\u043d\u044c9-10.txt

 We start by constructing a unicode string (originally from RT #58820)
 and displaying it, then we escape the string and display that.
 The escaped version should be the same as what appears in the
 quotes in the unicode:... literal, but as you can see above
 the _0 characters present in the original string are lost in
 the escaped version.  A hex dump shows that they are being turned
 into NUL bytes somehow:

   $ ./parrot y.pir | xxd
   000: 782f d185 d180 d0b5 d0bd d18c 5f30 392d  x/.._09-
   010: 3130 2e74 7874 0a78 2f5c 7530 3434 355c  10.txt.x/\u0445\
   020: 7530 3434 305c 7530 3433 355c 7530 3433  u0440\u0435\u043
   030: 645c 7530 3434 6300 0039 2d31 302e 7478  d\u044c..9-10.tx
   040: 740a t.
   $

 This bug appears to be very sensitive to the contents of this
 paritcular string -- adding, removing, or otherwise changing the
 string contents causes the bug to disappear.

Fixed in r32444, with your code turned into a test.  Thanks!

(This also fixes RT #58820).

-- c