Thu Aug 16 05:11:13 2018: Request 126280 was acted upon. Transaction: Correspondence added by RSCHUPP Queue: PAR-Packer Subject: 90-rt122949.t fails when "Use Unicode UTF-8 for worldwide language support" is enabled Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: x...@cpan.org Status: new Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=126280 >
On 2018-08-15 19:56:57, XENU wrote: > "\357\277\275" is a REPLACEMENT CHARACTER. It seems that when the UTF- > 8 checkbox is enabled, bytes that aren't valid UTF-8 are being > replaced with that character. "\x{85}" obviously isn't a valid UTF-8 > character. Nope, "\x{85}" is a valid Unicode code point (there's no such thing as a "UTF-8 character"), cf. http://www.unicode.org/charts/PDF/U0080.pdf For backgroud information, we're in a murky Windows area here: when you call the C-level function (somewhere in the guts of PAR::Packer) spawnvp(P_WAIT, "some.exe", argv) you have to actually manipulate the strings in argv[] so that some.exe actually sees the original argv in its main(argc, argv) The most obvious gotcha is when some argv[i] contains blanks, e.g. "foo bar quux", which will arrive at some.exe as *three* separate elements of argv[], "foo", "bar", "quux". See Win32::ShellQuote for details, that's where I stole most of the test cases from. Anyway, a 100% solution is probably not possible and "\x{85}", while legal Unicode, isn't a very relevant test case - it's a control char ("NEXT LINE"). So there may be a reason why Microsoft treats it differently under "Use Unicode UTF-8 for worldwide language support". Let's replace this test case with some more relevant cases uses of strings with non-ASCII chars: [ qq[german umlaute \x{E4}\x{F6}\x{FC}] ], [ qq[chinese zhongwen \x{4E2D}\{6587}] ], Can you rerun the failing test with these modifications under "Use Unicode..."? Cheers, Roderich