Mark Dootson <mark.doot...@znix.com> writes:

> None of my machines can be ASCII or EBCDIC by whatever definition this
> doc entry uses [...] What exactly is an ASCII machine?

ASCII just means: non-EBCDIC.

> Anyhow, I find that after
>
> $string = decode("utf8", $octets)
>
> $string always has the utf8 flag set, even if $octets is entirely
> ASCII data.

Yes, that is my experience as well. And this does not conform to the
documentation. I've been informed that the documentation is wrong, and
that *all* documentation concerning the utf8 flag seems to be wrong as
well -- or misleading at best.

> my $string = decode("utf8", $octets);
>
> ...do whatever string operations in Perl
>
> $wxobject->SetValue($string);
>
> will always work OK just providing something you did in
> ...do whatever string operations in Perl
> didn't strip the utf8 flag off.

I seem to recall that Perl sometimes automatically will upgrade a
string, but I never heard about downgrading. So unless you downgrade
explicitly this is not supposed happen.

> Clear as mud?
> For me too.

Me too :(

> For myself, if I were writing code that handled multibyte char sets in
> existing Wx releases I would do
>
> my $string = decode("UTF-8", $octets);
>
> ...do whatever string operations in Perl
>
> utf8::upgrade($string);
> $wxobject->SetValue($string);
>
> If you believe that utf8::upgrade($string) is not necessary, then
> don't use it. If you're belief is correct, all will work fine.

As long as the string operations did not downgrade the string,
utf8::upgrade($string) is a harmless no-op.

> The more I think about this and actually test what happens, the more I
> lean towards just always expecting string values that get passed to
> wxPerl wrappers to be UTF-8.

When passing a Perl string to the external world it must always be
encoded. This already holds for writing data to files.

The question is: do we consider wxWidgets to be 'external world'. That
answer is most likely 'yes'. But more important: do we consider wxPerl
to be 'external world'? I'd say 'no'. Therefore, what I'd expect to pass
to a wxPerl routine is a string in Perl's internal encoding. The wxPerl
wrapper should take care of encoding the string into whatever encoding
wxWidgets requires.

Compare this to {some, several, many, all} DBD drivers that handle the
internal/UTF-8 conversions transparently.

Alternatively, it's an (almost) equally good decision to require all
strings passed to a wxPerl routine to be encoded in UTF-8. For a program
that is correctly equipped to handle multibyte encoded strings it will
not make a difference.

-- Johan

Reply via email to