Hi,
On 01/05/2013 07:34, Johan Vromans wrote:
Mark Dootson <mark.doot...@znix.com> writes:
On 30/04/2013 19:19, Johan Vromans wrote:
We may assume that the Perl string is in Perl's internal encoding.
No we may not.
In that case you'll run into all kinds of encoding problems anyway.
If you attempt any string operations, indeed you will.
See e.g. perlunitut.
I kind of like the existing solution which doesn't break existing code
all over the place and simply requires the coder to be specific about
the format of the data they are sending.
My main concern is: If I have correctly decoded string data, will it
work when passed to wxWdigets. For example:
$orig = readline($datafile);
$line = decode( 'utf8', $orig );
$w = Wx::StaticText->new( ... );
$w->SetLabel($line);
If an explicit utf8::upgrade were required in this case, my feelings
tell me something is wrong.
Well, this morning I'm inclined to agree that this ought to be the case.
At least for:
$orig = readline($datafile);
$line = decode( 'UTF-8', $orig );
$w = Wx::StaticText->new( ... );
$w->SetLabel($line);
On the other hand I'm reluctant to introduce something that I'm certain
will break someone's code somewhere ( which is the entire basis for my
objection to making a change. )
So, my thinking is that I'll change it for builds against wxWidgets
2.9.x and above and announce on this list and in docs that strings
passed for wxString must be valid UTF-8.
I'll probably just use SvPVutf8_nolen on everything if this tests ok.
(For info of casual reader - the force part in SvPVutf8_force refers to
changing the SV to have a pv ( string ) representation only - nothing to
do with utf8. You would use it if you expected the C / C++ code might
change the value directly so it would force Perl to re-evaluate the next
time you used the SV in a number context. In our code the pv value will
never be changed directly.)
For anyone interested, the relevant code is in cpp/helpers.h wrapped in
a three part if/else
#if defined(wxUSE_UNICODE_UTF8) && wxUSE_UNICODE_UTF8
// Mac OSX and Linux
#elif wxUSE_UNICODE
// Windows
#else
// 2.8 ANSI build ( ignore it )
#endif
Macros
WXCHAR_INPUT, WXCHAR_OUTPUT, WXSTRING_INPUT, WXSTRING_OUTPUT
are used via the typemap.
Functions wxPli_wxChar_2_sv and wxPli_wxString_2_sv are also used
throughout the Wx code.
You will note that the return value from a wxString or multibyte char is
always flagged as utf8.
Regards
Mark