Hi,

On 01/05/2013 07:34, Johan Vromans wrote:
Mark Dootson <mark.doot...@znix.com> writes:

On 30/04/2013 19:19, Johan Vromans wrote:

We may assume that the Perl string is in Perl's internal encoding.

No we may not.

In that case you'll run into all kinds of encoding problems anyway.

If you attempt any string operations, indeed you will.


See e.g. perlunitut.

I kind of like the existing solution which doesn't break existing code
all over the place and simply requires the coder to be specific about
the format of the data they are sending.

My main concern is: If I have correctly decoded string data, will it
work when passed to wxWdigets. For example:

   $orig = readline($datafile);
   $line = decode( 'utf8', $orig );
   $w = Wx::StaticText->new( ... );
   $w->SetLabel($line);

If an explicit utf8::upgrade were required in this case, my feelings
tell me something is wrong.

Well, this morning I'm inclined to agree that this ought to be the case. At least for:

$orig = readline($datafile);
$line = decode( 'UTF-8', $orig );
$w = Wx::StaticText->new( ... );
$w->SetLabel($line);

On the other hand I'm reluctant to introduce something that I'm certain will break someone's code somewhere ( which is the entire basis for my objection to making a change. )

So, my thinking is that I'll change it for builds against wxWidgets 2.9.x and above and announce on this list and in docs that strings passed for wxString must be valid UTF-8.

I'll probably just use SvPVutf8_nolen on everything if this tests ok. (For info of casual reader - the force part in SvPVutf8_force refers to changing the SV to have a pv ( string ) representation only - nothing to do with utf8. You would use it if you expected the C / C++ code might change the value directly so it would force Perl to re-evaluate the next time you used the SV in a number context. In our code the pv value will never be changed directly.)

For anyone interested, the relevant code is in cpp/helpers.h wrapped in a three part if/else

#if defined(wxUSE_UNICODE_UTF8) && wxUSE_UNICODE_UTF8

// Mac OSX and Linux

#elif wxUSE_UNICODE

// Windows

#else

// 2.8 ANSI build ( ignore it )

#endif

Macros

WXCHAR_INPUT, WXCHAR_OUTPUT, WXSTRING_INPUT, WXSTRING_OUTPUT

are used via the typemap.

Functions wxPli_wxChar_2_sv and wxPli_wxString_2_sv are also used throughout the Wx code.

You will note that the return value from a wxString or multibyte char is always flagged as utf8.


Regards

Mark






















Reply via email to