Hi Guys, I don't have anything broken in this release in the 2 languages that I currently support (English and Portuguese). But even so, the whole utf8 process has been a bit time consuming. As we become more multilingual, I'm thinking that a global change to rename decode to libDecode (or something) might put all the contentious stuff in one place rather than spread out all over my code.
So, at the risk of suggesting that my Grandmother suck eggs: sub libDecode ($$){ # ##### Standard decode subroutine to put all UTF bits in one place. # my $encoding = shift; my $string = shift; # I could even put: # # $encoding = "UTF-8" if $encoding eq "utf8"; # just changed in one place in case it doesn't work. $string = decode(encoding, string ); utf8::upgrade($string); return $string; } Or just sub libDecode ($$){ return decode(@_); } At least all the things that might go wrong will all be here. Regards Steve. -----Original Message----- From: Mark Dootson [mailto:mark.doot...@znix.com] Sent: 01 May 2013 18:11 To: wxperl-users@perl.org Subject: wxString and UTF-8, utf8 etc etc etc again Hi, perldoc for the module Encode says: --------------------------------------------------------------------- CAVEAT: When you run $string = decode("utf8", $octets) , then $string might not be equal to $octets. Though both contain the same data, the UTF8 flag for $string is on unless $octets consists entirely of ASCII data on ASCII machines or EBCDIC on EBCDIC machines. --------------------------------------------------------------------- None of my machines can be ASCII or EBCDIC by whatever definition this doc entry uses as my testing on a variety of platforms shows that on Perl 5.8.8 through Perl 5.16.2 the above is most certainly not true. I shouldn't be surprised really. What exactly is an ASCII machine? Anyhow, I find that after $string = decode("utf8", $octets) $string always has the utf8 flag set, even if $octets is entirely ASCII data. So ....... my $string = decode("utf8", $octets); ...do whatever string operations in Perl $wxobject->SetValue($string); will always work OK just providing something you did in ...do whatever string operations in Perl didn't strip the utf8 flag off. Not that you'd know if it did. Clear as mud? For me too. For myself, if I were writing code that handled multibyte char sets in existing Wx releases I would do my $string = decode("UTF-8", $octets); ...do whatever string operations in Perl utf8::upgrade($string); $wxobject->SetValue($string); If you believe that utf8::upgrade($string) is not necessary, then don't use it. If you're belief is correct, all will work fine. The more I think about this and actually test what happens, the more I lean towards just always expecting string values that get passed to wxPerl wrappers to be UTF-8. By that I mean don't bother to test if Perl thinks it is UTF-8 or not, just attempt to convert the data buffer assuming that it is. I think I'll do it for the next release of Wx. Then stuff is much simpler and easy for user to understand and fits with current dogma on how stuff should work. The bottom line for anyone thinking 'is there something I'll have to change in my code?' - the answer is no. Unless it breaks. In which case - complain here. Cheers Mark