On 30/04/2013 19:19, Johan Vromans wrote:
We may assume that the Perl string is in Perl's internal encoding.
No we may not.
AFAIK, when a buffer contains valid UTF-8 (e.g., as result of an earlier
decode), utf8::upgrade is a no-op.
Not necessarily. See perldoc for decode and UTF-8 strings containing
only single byte characters. Perl has taken this approach for whatever
reason it has taken it. Doesn't concern us. We are only concerned with
marking scalars containing UTF-8 strings with utf8 flag when passing to
WXSTRING_INPUT macro.
As of Perl 5.14, use feature 'unicode_strings' will make sure that all
strings are, indeed, UTF-8. This takes the burden off the programmer to
call utf8::upgrade (and knowing when to call it).
'strings' perhaps - but not necessarily the content of valid scalars you
may wish to pass to the wxWidgets library wxString functions. In any
case, this is a Perl internals thing which should not be taxing us here
it is irrelevant to the original question.
If I read your proposal correctly, you want to demand that all data
buffers that may get passed to the wxString conversion function are
valid UTF-8?
Not really. They should be in Perl's internal coding, and thus can be
safely and transparently upgraded to UTF-8.
They will only be valid strings in Perl's internal coding if the user
has previously coded it to be so.
But in the end I think feature 'unicode_strings' will be the best and
most elegant solution.
I kind of like the existing solution which doesn't break existing code
all over the place and simply requires the coder to be specific about
the format of the data they are sending.
Cheers
Mark