Fri Jun 07 05:09:50 2013: Request 85943 was acted upon. Transaction: Ticket created by j...@pavlovsky.eu Queue: Wx Subject: utf8 handling bug Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: j...@pavlovsky.eu Status: new Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=85943 >
Hi, I had a following problem: I pass an array of object with stringification overload. Combobox displays stringified values and returns selected object. Works great unless the stringified value contains accented characters. Then the displayed value is messed up. I thought I located the problem to a bug in stringification/utf8 and reported it to perl-bug. But I got a reply suggesting it's a bug in Wx::Perl. See for details below: I think this is a bug in Wx::Perl. I just downloaded Wx-0.9922 from CPAN and did a quick scan. cpp/helpers.cpp contains this, which I assume is a utility function used by various parts of Wx::Perl: #if wxUSE_UNICODE static wxChar* wxPli_copy_string( SV* scalar, wxChar** ) { dTHX; STRLEN length; wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ? wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ) : wxWCharBuffer( wxString( SvPV( scalar, length ), wxConvLocal ).wc_str() ); wxChar* buffer = new wxChar[length + 1]; memcpy( buffer, tmp.data(), length * sizeof(wxChar) ); buffer[length] = wxT('\0'); return buffer; } #endif Checking SvUTF8(scalar) before any stringification is incorrect. What it should be doing is something like this: dTHX; STRLEN length; char * const s = SvPV( scalar, length ); wxWCharBuffer tmp = ( SvUTF8( scalar ) ) ? wxConvUTF8.cMB2WX( s ) : wxWCharBuffer( wxString( s, wxConvLocal ).wc_str() ); I don’t know what the wxConvLocal does, but if it does anything other than treat the string as Latin1, then that is also incorrect, and this would be better: dTHX; STRLEN length; wxWCharBuffer tmp = wxConvUTF8.cMB2WX( SvPVutf8( scalar, length ) ); This aspect of SvUTF8 is nothing new, as has been documented since 2006 (commit cd028baaa4): SvUTF8 Returns a U32 value indicating the UTF-8 status of an SV. If things are set-up properly, this indicates whether or not the SV contains UTF-8 encoded data. You should use this after a call to SvPV() or one of its variants, in case any call to string overloading updates the internal flag. (The current wording is of recent provenance and comes from commit fd1423831.) I don’t know enough about Wx to write a test case, so could you report this to bug...@rt.cpan.org? -- Jiří Pavlovský