Hi Guys, > Well, this morning I'm inclined to agree that this ought to be the case. > At least for:
> $orig = readline($datafile); > $line = decode( 'UTF-8', $orig ); > $w = Wx::StaticText->new( ... ); > $w->SetLabel($line); > On the other hand I'm reluctant to introduce something that I'm certain > will break someone's code somewhere ( which is the entire basis for my > objection to making a change. ) > So, my thinking is that I'll change it for builds against wxWidgets > 2.9.x and above and announce on this list and in docs that strings > passed for wxString must be valid UTF-8. Well all this just serves to deepen my confusion. 1) What is the difference between: $line = decode( 'UTF-8', $orig ); and $line = decode( 'utf8', $orig ); I use the latter and Octavian has used the former and they both seem to work. Why did the latter not work for Octavian? Which is correct? 2) Mark, your earlier logic seemed clear and unassailable, yet now you seem to change your mind. You said: > So basically, if the scalar is marked as 'utf8' then it gets converted > into a wxString as such. If not, you're at the mercy of libc and local > system settings. It may work. It may not. > Solution - your conversion of external data should be > my $string = decode($encoding, $binary); > utf8::upgrade($string); > This should be platform independent and work - always. Perl's string > functions should all work OK on $string. I use: $line = decode( 'utf8', $orig ); and I never have a problem, but according to this logic that is luck. I accept this and I am happy to use utf8::upgrade($string); I think we should assume that in the general case there will always be some Perl processing before wxWidgets sees the string. The general case is: 1 - Retrieve data from file or database (this maybe automatically decoded or not, depending on the database and the driver); 2 - Do something to it (thus may be a null operation); 3 - Pass to wxWidgets to display to user. To conserve string lengths and string processing (eg a simple alphabetical sort in utf8). If there is to be decoding, it must take place at between 1 and 2 above. When you say: > So, my thinking is that I'll change it for builds against wxWidgets > 2.9.x and above What does "it" mean? That you will include utf8::upgrade($string) in the interface? I can't see any harm in this. Just setting a character bit to 1 before an operation and again later at worst just seems redundant. But if we have the position where decode is called twice, this will create problems for me. A doubly decoded value gets corrupted and becomes a diamond with a question mark in it, or some such value. Regards Steve.