[rt.cpan.org #126280] 90-rt122949.t fails when "Use Unicode UTF-8 for worldwide language support" is enabled

Roderich Schupp via RT Thu, 16 Aug 2018 02:11:56 -0700

Thu Aug 16 05:11:13 2018: Request 126280 was acted upon.
Transaction: Correspondence added by RSCHUPP
       Queue: PAR-Packer
     Subject: 90-rt122949.t fails when "Use Unicode UTF-8 for worldwide 
language support" is enabled
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: x...@cpan.org
      Status: new
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=126280 >

On 2018-08-15 19:56:57, XENU wrote:
> "\357\277\275" is a REPLACEMENT CHARACTER. It seems that when the UTF-
> 8 checkbox is enabled, bytes that aren't valid UTF-8 are being
> replaced with that character. "\x{85}" obviously isn't a valid UTF-8
> character.

Nope, "\x{85}" is a valid Unicode code point (there's no such thing as a
"UTF-8 character"), cf. http://www.unicode.org/charts/PDF/U0080.pdf

For backgroud information, we're in a murky Windows area here: 
when you call the C-level function (somewhere in the guts of PAR::Packer)

  spawnvp(P_WAIT, "some.exe", argv)

you have to actually manipulate the strings in argv[] so that some.exe
actually sees the original argv in its

   main(argc, argv)

The most obvious gotcha is when some argv[i] contains blanks, e.g. 
"foo bar quux", which will arrive at some.exe as *three* separate elements of 
argv[],
"foo", "bar", "quux". See Win32::ShellQuote for details, that's where I stole
most of the test cases from.

Anyway, a 100% solution is probably not possible and "\x{85}", while legal 
Unicode,
isn't a very relevant test case - it's a control char ("NEXT LINE"). So there 
may
be a reason why Microsoft treats it differently under "Use Unicode UTF-8 for 
worldwide language support".
Let's replace this test case with some more relevant cases uses of strings 
with non-ASCII chars:

  [ qq[german umlaute \x{E4}\x{F6}\x{FC}] ],
  [ qq[chinese zhongwen \x{4E2D}\{6587}] ],

Can you rerun the failing test with these modifications under "Use Unicode..."?

Cheers, Roderich

[rt.cpan.org #126280] 90-rt122949.t fails when "Use Unicode UTF-8 for worldwide language support" is enabled

Reply via email to