Mark Dootson <mark.doot...@znix.com> writes: > None of my machines can be ASCII or EBCDIC by whatever definition this > doc entry uses [...] What exactly is an ASCII machine?
ASCII just means: non-EBCDIC. > Anyhow, I find that after > > $string = decode("utf8", $octets) > > $string always has the utf8 flag set, even if $octets is entirely > ASCII data. Yes, that is my experience as well. And this does not conform to the documentation. I've been informed that the documentation is wrong, and that *all* documentation concerning the utf8 flag seems to be wrong as well -- or misleading at best. > my $string = decode("utf8", $octets); > > ...do whatever string operations in Perl > > $wxobject->SetValue($string); > > will always work OK just providing something you did in > ...do whatever string operations in Perl > didn't strip the utf8 flag off. I seem to recall that Perl sometimes automatically will upgrade a string, but I never heard about downgrading. So unless you downgrade explicitly this is not supposed happen. > Clear as mud? > For me too. Me too :( > For myself, if I were writing code that handled multibyte char sets in > existing Wx releases I would do > > my $string = decode("UTF-8", $octets); > > ...do whatever string operations in Perl > > utf8::upgrade($string); > $wxobject->SetValue($string); > > If you believe that utf8::upgrade($string) is not necessary, then > don't use it. If you're belief is correct, all will work fine. As long as the string operations did not downgrade the string, utf8::upgrade($string) is a harmless no-op. > The more I think about this and actually test what happens, the more I > lean towards just always expecting string values that get passed to > wxPerl wrappers to be UTF-8. When passing a Perl string to the external world it must always be encoded. This already holds for writing data to files. The question is: do we consider wxWidgets to be 'external world'. That answer is most likely 'yes'. But more important: do we consider wxPerl to be 'external world'? I'd say 'no'. Therefore, what I'd expect to pass to a wxPerl routine is a string in Perl's internal encoding. The wxPerl wrapper should take care of encoding the string into whatever encoding wxWidgets requires. Compare this to {some, several, many, all} DBD drivers that handle the internal/UTF-8 conversions transparently. Alternatively, it's an (almost) equally good decision to require all strings passed to a wxPerl routine to be encoded in UTF-8. For a program that is correctly equipped to handle multibyte encoded strings it will not make a difference. -- Johan