Re: wxString and UTF-8, utf8 etc etc etc again

2013-05-02 Thread Johan Vromans
Mark Dootson mark.doot...@znix.com writes:

 None of my machines can be ASCII or EBCDIC by whatever definition this
 doc entry uses [...] What exactly is an ASCII machine?

ASCII just means: non-EBCDIC.

 Anyhow, I find that after

 $string = decode(utf8, $octets)

 $string always has the utf8 flag set, even if $octets is entirely
 ASCII data.

Yes, that is my experience as well. And this does not conform to the
documentation. I've been informed that the documentation is wrong, and
that *all* documentation concerning the utf8 flag seems to be wrong as
well -- or misleading at best.

 my $string = decode(utf8, $octets);

 ...do whatever string operations in Perl

 $wxobject-SetValue($string);

 will always work OK just providing something you did in
 ...do whatever string operations in Perl
 didn't strip the utf8 flag off.

I seem to recall that Perl sometimes automatically will upgrade a
string, but I never heard about downgrading. So unless you downgrade
explicitly this is not supposed happen.

 Clear as mud?
 For me too.

Me too :(

 For myself, if I were writing code that handled multibyte char sets in
 existing Wx releases I would do

 my $string = decode(UTF-8, $octets);

 ...do whatever string operations in Perl

 utf8::upgrade($string);
 $wxobject-SetValue($string);

 If you believe that utf8::upgrade($string) is not necessary, then
 don't use it. If you're belief is correct, all will work fine.

As long as the string operations did not downgrade the string,
utf8::upgrade($string) is a harmless no-op.

 The more I think about this and actually test what happens, the more I
 lean towards just always expecting string values that get passed to
 wxPerl wrappers to be UTF-8.

When passing a Perl string to the external world it must always be
encoded. This already holds for writing data to files.

The question is: do we consider wxWidgets to be 'external world'. That
answer is most likely 'yes'. But more important: do we consider wxPerl
to be 'external world'? I'd say 'no'. Therefore, what I'd expect to pass
to a wxPerl routine is a string in Perl's internal encoding. The wxPerl
wrapper should take care of encoding the string into whatever encoding
wxWidgets requires.

Compare this to {some, several, many, all} DBD drivers that handle the
internal/UTF-8 conversions transparently.

Alternatively, it's an (almost) equally good decision to require all
strings passed to a wxPerl routine to be encoded in UTF-8. For a program
that is correctly equipped to handle multibyte encoded strings it will
not make a difference.

-- Johan


Re: wxString and UTF-8, utf8 etc etc etc again

2013-05-02 Thread Mark Dootson

Hi,

On 02/05/2013 10:34, Johan Vromans wrote:


The question is: do we consider wxWidgets to be 'external world'. That
answer is most likely 'yes'. But more important: do we consider wxPerl
to be 'external world'? I'd say 'no'. Therefore, what I'd expect to pass
to a wxPerl routine is a string in Perl's internal encoding. The wxPerl
wrapper should take care of encoding the string into whatever encoding
wxWidgets requires.

Compare this to {some, several, many, all} DBD drivers that handle the
internal/UTF-8 conversions transparently.

Alternatively, it's an (almost) equally good decision to require all
strings passed to a wxPerl routine to be encoded in UTF-8. For a program
that is correctly equipped to handle multibyte encoded strings it will
not make a difference.

-- Johan


I'd agree with that. For Wx 0.9922 I've changed the conversion to 
wxString so that it always uses a UTF-8 conversion.


At least I'm assuming that's what SvPVutf8_nolen does. It seems to, and 
the docs say it does. Contrary advice from perlapi utf8 gurus most 
welcome. Essentially I want


char * buffer = SomePerlApi( SV );
where buffer is the address at the start of a stream of UTF-8 octets.
I think SvPVutf8_nolen( SV ) does the job.

In testing this change though, I had to go back to Perl 5.8.9 to 
contrive a case where it made any difference. Even there, it might be an 
issue with older module versions - but given it was Perl 5.8.9 I was not 
inclined to investigate further.


So I'd say from testing that wxPerl already handled things transparently 
and continues to do so.


In short, I think that

my $string = decode( $someencoding, $externaldata );

or

my $string = $some_other_Perl_string;

# ... optionally some string operations on $string

$wxobject-SetValue( $string );

Should always work and if it doesn't it is a probably a bug in wxPerl. 
This is regardless of utf8 flags etc etc. It should just work from an 
end user perspective.


If someone demonstrates an instance where the above doesn't work, I'll 
endeavour to get it fixed.



Cheers

Mark












































RE: wxString and UTF-8, utf8 etc etc etc again

2013-05-01 Thread Steve Cookson
Hi Guys,

I don't have anything broken in this release in the 2 languages that I
currently support (English and Portuguese).  But even so, the whole utf8
process has been a bit time consuming. As we become more multilingual, I'm
thinking that a global change to rename decode to libDecode (or something)
might put all the contentious stuff in one place rather than spread out all
over my code.

So, at the risk of suggesting that my Grandmother suck eggs:

sub libDecode ($$){
#
# Standard decode subroutine to put all UTF bits in one place.
#
my $encoding = shift;
my $string = shift;

# I could even put:
#
#   $encoding = UTF-8 if $encoding eq utf8;
#   just changed in one place in case it doesn't work.

$string = decode(encoding, string );
utf8::upgrade($string);
return $string;
}

Or just

sub libDecode ($$){
return decode(@_);
}

At least all the things that might go wrong will all be here.

Regards

Steve.

-Original Message-
From: Mark Dootson [mailto:mark.doot...@znix.com] 
Sent: 01 May 2013 18:11
To: wxperl-users@perl.org
Subject: wxString and UTF-8, utf8 etc etc etc again

Hi,

perldoc for the module Encode says:

-
CAVEAT: When you run $string = decode(utf8, $octets) , then $string 
might not be equal to $octets. Though both contain the same data, the 
UTF8 flag for $string is on unless $octets consists entirely of ASCII 
data on ASCII machines or EBCDIC on EBCDIC machines.
-

None of my machines can be ASCII or EBCDIC by whatever definition this 
doc entry uses as my testing on a variety of platforms shows that on 
Perl 5.8.8 through Perl 5.16.2 the above is most certainly not true.
I shouldn't be surprised really. What exactly is an ASCII machine?

Anyhow, I find that after

$string = decode(utf8, $octets)

$string always has the utf8 flag set, even if $octets is entirely ASCII 
data.

So ...

my $string = decode(utf8, $octets);

...do whatever string operations in Perl

$wxobject-SetValue($string);

will always work OK just providing something you did in
...do whatever string operations in Perl
didn't strip the utf8 flag off. Not that you'd know if it did.

Clear as mud?

For me too.

For myself, if I were writing code that handled multibyte char sets in 
existing Wx releases I would do

my $string = decode(UTF-8, $octets);

...do whatever string operations in Perl

utf8::upgrade($string);
$wxobject-SetValue($string);

If you believe that utf8::upgrade($string) is not necessary, then don't 
use it. If you're belief is correct, all will work fine.

The more I think about this and actually test what happens, the more I 
lean towards just always expecting string values that get passed to 
wxPerl wrappers to be UTF-8. By that I mean don't bother to test if Perl 
thinks it is UTF-8 or not, just attempt to convert the data buffer 
assuming that it is. I think I'll do it for the next release of Wx.

Then stuff is much simpler and easy for user to understand and fits with 
current dogma on how stuff should work.

The bottom line for anyone thinking 'is there something I'll have to 
change in my code?' - the answer is no. Unless it breaks. In which case 
- complain here.

Cheers

Mark








































Re: wxString and UTF-8, utf8 etc etc etc again

2013-05-01 Thread Mark Dootson

Hi,

On 02/05/2013 00:17, Steve Cookson wrote:


Or just

sub libDecode ($$){
return decode(@_);
}

At least all the things that might go wrong will all be here.


You're unduly worried ( probably my fault ).

my $string = decode($encoding, $binary);

Is fine.

Cheers

Mark











Re: wxString and UTF-8, utf8 etc etc etc again

2013-05-01 Thread Octavian Rasnita
From: Mark Dootson mark.doot...@znix.com


 Hi,
 
 On 02/05/2013 00:17, Steve Cookson wrote:
 
 Or just

 sub libDecode ($$){
 return decode(@_);
 }

 At least all the things that might go wrong will all be here.
 
 You're unduly worried ( probably my fault ).
 
 my $string = decode($encoding, $binary);
 


So no need of utf8::upgrade()?

Octavian