On 30/04/2013 19:19, Johan Vromans wrote:

We may assume that the Perl string is in Perl's internal encoding.

No we may not.

AFAIK, when a buffer contains valid UTF-8 (e.g., as result of an earlier
decode), utf8::upgrade is a no-op.

Not necessarily. See perldoc for decode and UTF-8 strings containing only single byte characters. Perl has taken this approach for whatever reason it has taken it. Doesn't concern us. We are only concerned with marking scalars containing UTF-8 strings with utf8 flag when passing to WXSTRING_INPUT macro.


As of Perl 5.14, use feature 'unicode_strings' will make sure that all
strings are, indeed, UTF-8. This takes the burden off the programmer to
call utf8::upgrade (and knowing when to call it).


'strings' perhaps - but not necessarily the content of valid scalars you may wish to pass to the wxWidgets library wxString functions. In any case, this is a Perl internals thing which should not be taxing us here it is irrelevant to the original question.

If I read your proposal correctly, you want to demand that all data
buffers that may get passed to the wxString conversion function are
valid UTF-8?

Not really. They should be in Perl's internal coding, and thus can be
safely and transparently upgraded to UTF-8.

They will only be valid strings in Perl's internal coding if the user has previously coded it to be so.


But in the end I think feature 'unicode_strings' will be the best and
most elegant solution.

I kind of like the existing solution which doesn't break existing code all over the place and simply requires the coder to be specific about the format of the data they are sending.

Cheers

Mark

Reply via email to