I have tried this but it still displays squares instead of UTF-8 chars.
(I deleted the line with decode() and I set the font to "Arial".)

The new test file is at:

http://maestrodex.ro/static/test3.zip

--Octavian

----- Original Message ----- From: "Mark Dootson" <mark.doot...@znix.com>
To: "Octavian Rasnita" <orasn...@gmail.com>
Cc: <steveco.1...@gmail.com>; <wxperl-users@perl.org>
Sent: Tuesday, April 30, 2013 10:43 PM
Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields?


Hi,

Comment out the line

  $text = decode('utf8', $text );

you do not need it.

Change the font name requested to 'Arial'.

Everything should work.

I'll try to figure out if there's a way to query a font to check if it has glyphs for particular code points. The old Font Encoding setting seems useless here.

Regards

Mark



On 30/04/2013 20:29, Octavian Rasnita wrote:
Hi Mark,

I tried your suggestion and I removed the constant wxVSCROLL  from the
attributes of Wx::TextCtrl constructor, but the UTF-8 encoded chars
still appear as squares.
I use Windows XP Pro and ActivePerl 5.14.2.
(I need to use ActivePerl and not another distribution because I need to
create a COM server with this application.)

Then I tried adding:
utf8::upgrade( $text );
then
$textfield->AppendText( $text );

But no difference. Those chars still appear as squares.

Then I also added:

use Encode;
$text = decode('utf8', $text );
utf8::upgrade( $text );

But this time it gave the following error:

Cannot decode string with wide characters at D:/usr/lib/Encode.pm line 176.

The scalar $text is obtain from an SQLite database and I connect to it
using:

my $dbh = DBI->connect("dbi:SQLite:test.db");
$dbh->do("PRAGMA cache_size = 80000");
$dbh->do("PRAGMA synchronous = OFF");
$dbh->{sqlite_unicode} = 1;

The new cod with the SQLite db is at:

http://maestrodex.ro/static/test2.zip

I selected the record from this DB in command line and I've seen that
the special char "ț" appears as 2 chars, so I think the char is added
well in DB.

--Octavian

----- Original Message ----- From: "Mark Dootson" <mark.doot...@znix.com>
To: <steveco.1...@gmail.com>; <wxperl-users@perl.org>
Sent: Monday, April 29, 2013 4:32 PM
Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields?


Hi,

A Perl scalar has a character buffer to store character or byte data.
This data can be interpreted and stored by Perl in one of two formats:

1. Perl's internal data format
2. A number octets (bytes) representing a UTF-8 encoded string.

Internally it is just a memory buffer. Each scalar has a utf8 flag.
This tells Perl internally how to interpret its data buffer. Either as
Perl's internal data format or as UTF-8 encoded text. If the utf8 flag
is on, Perl regards the buffer as UTF-8 encoded text. If the utf8 flag
is off, Perl regards the buffer as containing data in Perl's internal
format.

So, say I load some binary data that I know is text encoded using
'ISO-8859-1'.

Then I would do:

my $string = decode('ISO-8859-1', $binary);

This gets $string which contains data in Perl's internal format. The
utf8 flag for the scalar '$string' is off As you have noted below, I
can't pass '$binary' to any of Perl's string functions. The results
will be unpredictable and mostly bad.

The evil starts due to some special features when we use decode to
convert a UTF-8 encoded string.

my $string = decode('utf8', $binary);

If $binary can be converted to $string using single byte characters,
then $string will be in Perl's internal data format and marked as
such. (utf8 flag off). If $binary contains multiple byte characters
the $string will contain a series of bytes representing a UTF-8
encoded string and the scalar '$string' will have the utf8 flag on.

Within Perl it should not matter whether the scalar is marked UTF-8 or
not - so long as the utf8 flag correctly reflects what's in the
scalar's data buffer.

The problem comes when we come to pass the data to the wxWidgets library.

The source macro that does this is:

#define WXSTRING_INPUT( var, type, arg ) \
  var =  ( SvUTF8( arg ) ) ? \
           wxString( SvPVutf8_nolen( arg ), wxConvUTF8 ) \
         : wxString( SvPV_nolen( arg ), wxConvLibc );


So basically, if the scalar is marked as 'utf8' then it gets converted
into a wxString as such. If not, you're at the mercy of libc and local
system settings. It may work. It may not.


Solution - your conversion of external data should be

 my $string = decode($encoding, $binary);
 utf8::upgrade($string);

This should be platform independent and work - always. Perl's string
functions should all work OK on $string.

The key points

my $string = decode('utf8', $binary);

It depends on the content of $binary whether $string has the utf8 flag
set.

my $string = decode('utf8', $binary);
utf8::upgrade( $string );

$string always has utf8 flag set. You could just do
utf8::upgrade($binary) but that would be a special case for when
$binary actually contains UTF-8 bytes. The two step method applies to
any encoding.

Perl can't know that a scalar contains UTF-8 encoded text unless you
tell it.

The statement:

'use utf8;' Is not needed anywhere here of course as it indicates that
the source code is encoded in UTF-8. Nothing more. Functions
utf8::upgrade etc. are always available.

If you have a list of scalars containing strings as in

@combo_options

then the same applies - to each individual scalar / string in the list.


Hope it helps.


Mark


On 29/04/2013 12:29, steveco.1...@gmail.com wrote:

Hi Mark, I'm a relative new comer to utf8 so please take everything I
say
with a pinch of salt but your answer looks a bit qualified: if
scalar, if
marked. That implies if I want a Perl list (say @combo_options) for a
Wx::ComboBox, then that won't work? Is that how it is?

And I don't know what marked means.

The real problem for me is that this feels like the wrong place to
decode.
There are lots of things I might want to do with a string before I
display
it.  I might want to sort it, or trim white space, or substitute a
place-marker with a value.  And for these I need it to be decoded
before I
process it. If I have a very simple app with no string processing, then
this approach would be great but not otherwise.

I did have a lot of issues with utf8 at the beginning sometimes I had
display issues with a utf8 string and sometimes not.  There seemed to
be no
particular rhyme or reason to it.  And as you say, it works
differently on
Windows and Linux.  Finally, everything is very sensitive to small
errors,
like having a non-existent style code.

So I use a policy which is that when I read a value into my program
from a
database or a file, I always decode immediately.  That way I know
that all
my variables are decoded and processable.

Then I encode before I write back to the file or db.

If I have an issue now, it is always where I have not done a:

$var = decode("utf8",$row->{ATT_BOOKING_COMMENT_TXT}) ;

Anyhow let me know what you think.

Regards

Steve





Reply via email to