Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Mark Dootson mark.doot...@znix.com writes: On 30/04/2013 19:19, Johan Vromans wrote: We may assume that the Perl string is in Perl's internal encoding. No we may not. In that case you'll run into all kinds of encoding problems anyway. See e.g. perlunitut. I kind of like the existing solution which doesn't break existing code all over the place and simply requires the coder to be specific about the format of the data they are sending. My main concern is: If I have correctly decoded string data, will it work when passed to wxWdigets. For example: $orig = readline($datafile); $line = decode( 'utf8', $orig ); $w = Wx::StaticText-new( ... ); $w-SetLabel($line); If an explicit utf8::upgrade were required in this case, my feelings tell me something is wrong. -- Johan
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, On 01/05/2013 07:34, Johan Vromans wrote: Mark Dootson mark.doot...@znix.com writes: On 30/04/2013 19:19, Johan Vromans wrote: We may assume that the Perl string is in Perl's internal encoding. No we may not. In that case you'll run into all kinds of encoding problems anyway. If you attempt any string operations, indeed you will. See e.g. perlunitut. I kind of like the existing solution which doesn't break existing code all over the place and simply requires the coder to be specific about the format of the data they are sending. My main concern is: If I have correctly decoded string data, will it work when passed to wxWdigets. For example: $orig = readline($datafile); $line = decode( 'utf8', $orig ); $w = Wx::StaticText-new( ... ); $w-SetLabel($line); If an explicit utf8::upgrade were required in this case, my feelings tell me something is wrong. Well, this morning I'm inclined to agree that this ought to be the case. At least for: $orig = readline($datafile); $line = decode( 'UTF-8', $orig ); $w = Wx::StaticText-new( ... ); $w-SetLabel($line); On the other hand I'm reluctant to introduce something that I'm certain will break someone's code somewhere ( which is the entire basis for my objection to making a change. ) So, my thinking is that I'll change it for builds against wxWidgets 2.9.x and above and announce on this list and in docs that strings passed for wxString must be valid UTF-8. I'll probably just use SvPVutf8_nolen on everything if this tests ok. (For info of casual reader - the force part in SvPVutf8_force refers to changing the SV to have a pv ( string ) representation only - nothing to do with utf8. You would use it if you expected the C / C++ code might change the value directly so it would force Perl to re-evaluate the next time you used the SV in a number context. In our code the pv value will never be changed directly.) For anyone interested, the relevant code is in cpp/helpers.h wrapped in a three part if/else #if defined(wxUSE_UNICODE_UTF8) wxUSE_UNICODE_UTF8 // Mac OSX and Linux #elif wxUSE_UNICODE // Windows #else // 2.8 ANSI build ( ignore it ) #endif Macros WXCHAR_INPUT, WXCHAR_OUTPUT, WXSTRING_INPUT, WXSTRING_OUTPUT are used via the typemap. Functions wxPli_wxChar_2_sv and wxPli_wxString_2_sv are also used throughout the Wx code. You will note that the return value from a wxString or multibyte char is always flagged as utf8. Regards Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Octavian, I loaded up a Windows XP system, installed ActivePerl 5.14.5, and found the issue. The 'Arial' font on your system is not the same as the 'Arial' font on my Windows Vista machine. Whether this is because I have a version of MS Office installed which comes with enhanced fonts or it is a Vista vs XP issue I am not sure. Anyway, a little experimentation found me a font installed by default on Windows XP systems that does seem to have a more extensive set of glyphs. If you use the font 'Microsoft Sans Serif' everything should then work. ( assuming you have XP Service Pack 2 installed ) Microsoft Sans Serif - from wikipedia -- Windows XP - SP1 = Version 1.10 of the font includes 1119 glyphs (1209 characters, 26 blocks), supporting Unicode ranges Alphabetic Presentation forms, Arabic, Arabic Presentation forms A-B, Cyrillic, General Punctuation, Greek and Coptic, Hebrew, Latin Extended-A, Latin Extended-B, Latin Extended Additional, Mathematical Operators, Thai. Supported code pages include 1250-1258, Macintosh US Roman, 874, 864, 862, 708. Font is smoothed at 0-6 points, hinted at 7-14 points, hinted and smoothed at 15 and above points. OpenType features includes init, isol, medi, fina, liga for default Arabic script. Windows XP - SP2 = Version 1.41 of the font includes 2257 glyphs (2301 characters, 28 blocks), which extended Unicode ranges to include Combining Diacritical Marks, Currency Symbols, Cyrillic Supplement, Geometric Shapes, Greek Extended, IPA Extensions, Number Forms, Spacing Modifier Letters. New OpenType scripts include Arabic MAR script. Additional OpenType features includes rlig for Arabic scripts. Windows Vista - Version 5.00 - includes 3053 glyphs (2788 characters, 36 blocks), which extended Unicode ranges to include Arabic Supplement, Combining Diacritical Marks Supplement, Combining Half Marks, Latin Extended-C, Latin Extended-D, Phonetic Extensions, Phonetic Extensions Supplement, Specials, Superscripts and Subscripts. New OpenType scripts include Arabic URD (Urdu), Cyrillic (default), Hebrew (default), Latin (default, Romanian), Thai (default). Additional OpenType features includes ccmp, mark, mkmk for Arabic scripts; locl for Arabic URD (Urdu) script; mark, mkmk for default Cyrillic; dlig, ccmp, mark for default Hebrew; ccmp, mark, mkmk for Latin scripts; locl for Romanian Latin; ccmp, mark, mkmk for Thai. -- I have attached an amended test.pl that works for me on a basic Windows XP install with ActivePerl 5.14.5. The only change absolutely required from your last test zip is to change the name of the font to 'Microsoft Sans Serif'. Regards Mark On 01/05/2013 06:34, Octavian Rasnita wrote: I have tried this but it still displays squares instead of UTF-8 chars. (I deleted the line with decode() and I set the font to Arial.) The new test file is at: http://maestrodex.ro/static/test3.zip --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: Octavian Rasnita orasn...@gmail.com Cc: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Tuesday, April 30, 2013 10:43 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, Comment out the line $text = decode('utf8', $text ); you do not need it. Change the font name requested to 'Arial'. Everything should work. I'll try to figure out if there's a way to query a font to check if it has glyphs for particular code points. The old Font Encoding setting seems useless here. Regards Mark On 30/04/2013 20:29, Octavian Rasnita wrote: Hi Mark, I tried your suggestion and I removed the constant wxVSCROLL from the attributes of Wx::TextCtrl constructor, but the UTF-8 encoded chars still appear as squares. I use Windows XP Pro and ActivePerl 5.14.2. (I need to use ActivePerl and not another distribution because I need to create a COM server with this application.) Then I tried adding: utf8::upgrade( $text ); then $textfield-AppendText( $text ); But no difference. Those chars still appear as squares. Then I also added: use Encode; $text = decode('utf8', $text ); utf8::upgrade( $text ); But this time it gave the following error: Cannot decode string with wide characters at D:/usr/lib/Encode.pm line 176. The scalar $text is obtain from an SQLite database and I connect to it using: my $dbh = DBI-connect(dbi:SQLite:test.db); $dbh-do(PRAGMA cache_size = 8); $dbh-do(PRAGMA synchronous = OFF); $dbh-{sqlite_unicode} = 1; The new cod with the SQLite db is at: http://maestrodex.ro/static/test2.zip I selected the record from this DB in command line and I've seen that the special char ț appears as 2 chars, so I think the char is added well in DB. --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Monday, April
RE: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi Guys, Well, this morning I'm inclined to agree that this ought to be the case. At least for: $orig = readline($datafile); $line = decode( 'UTF-8', $orig ); $w = Wx::StaticText-new( ... ); $w-SetLabel($line); On the other hand I'm reluctant to introduce something that I'm certain will break someone's code somewhere ( which is the entire basis for my objection to making a change. ) So, my thinking is that I'll change it for builds against wxWidgets 2.9.x and above and announce on this list and in docs that strings passed for wxString must be valid UTF-8. Well all this just serves to deepen my confusion. 1) What is the difference between: $line = decode( 'UTF-8', $orig ); and $line = decode( 'utf8', $orig ); I use the latter and Octavian has used the former and they both seem to work. Why did the latter not work for Octavian? Which is correct? 2) Mark, your earlier logic seemed clear and unassailable, yet now you seem to change your mind. You said: So basically, if the scalar is marked as 'utf8' then it gets converted into a wxString as such. If not, you're at the mercy of libc and local system settings. It may work. It may not. Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); This should be platform independent and work - always. Perl's string functions should all work OK on $string. I use: $line = decode( 'utf8', $orig ); and I never have a problem, but according to this logic that is luck. I accept this and I am happy to use utf8::upgrade($string); I think we should assume that in the general case there will always be some Perl processing before wxWidgets sees the string. The general case is: 1 - Retrieve data from file or database (this maybe automatically decoded or not, depending on the database and the driver); 2 - Do something to it (thus may be a null operation); 3 - Pass to wxWidgets to display to user. To conserve string lengths and string processing (eg a simple alphabetical sort in utf8). If there is to be decoding, it must take place at between 1 and 2 above. When you say: So, my thinking is that I'll change it for builds against wxWidgets 2.9.x and above What does it mean? That you will include utf8::upgrade($string) in the interface? I can't see any harm in this. Just setting a character bit to 1 before an operation and again later at worst just seems redundant. But if we have the position where decode is called twice, this will create problems for me. A doubly decoded value gets corrupted and becomes a diamond with a question mark in it, or some such value. Regards Steve.
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, On 01/05/2013 16:49, steveco.1...@gmail.com wrote: Well all this just serves to deepen my confusion. 1) What is the difference between: $line = decode( 'UTF-8', $orig ); and $line = decode( 'utf8', $orig ); Always use decode( 'UTF-8', $orig ); 'UTF-8' means what it says. In my opinion, 'utf8' means something really quite like utf8 in all but a few respects but which isn't UTF-8 and is a left over from the dog's breakfast of Perl Unicode string handling, encoding and source code handling that took a decade to fix. Perl's documents currently refer to UTF-8 as 'strict UTF-8'. There's no sanity to it. Why the docs don't just say 'utf8' is really a left over from an era of big mistakes, I don't know. Why did the latter not work for Octavian? It isn't the difference between 'utf8' and 'UTF-8' that caused Octavian's code to fail. 2) Mark, your earlier logic seemed clear and unassailable, yet now you seem to change your mind. I got worn down. It is, after all a community project. The logic seemed clear and unassailable to me too. When faced with an argument that simply ignores everything you say you are left with the option of repeating yourself for ever, ignoring the opposite argument, or giving up and agreeing. Life is short so I gave up and agreed. I always try to take the approach that even if the other fellow is wrong in principle, what exactly would be the downside to agreeing. It leaves you with the time and energy available to go on repeating yourself forever on the important stuff. It won't break much I don't think. I use: $line = decode( 'utf8', $orig ); and I never have a problem, but according to this logic that is luck. I accept this and I am happy to use utf8::upgrade($string); I think we should assume that in the general case there will always be some Perl processing before wxWidgets sees the string. The general case is: 1 - Retrieve data from file or database (this maybe automatically decoded or not, depending on the database and the driver); 2 - Do something to it (thus may be a null operation); 3 - Pass to wxWidgets to display to user. To conserve string lengths and string processing (eg a simple alphabetical sort in utf8). If there is to be decoding, it must take place at between 1 and 2 above. When you say: So, my thinking is that I'll change it for builds against wxWidgets 2.9.x and above What does it mean? That you will include utf8::upgrade($string) in the interface? No, the code will just assume that the string passed is valid UTF-8 and attempt to convert it to a wxString accordingly. It will never call the libc option. I can't see any harm in this. Just setting a character bit to 1 before an operation and again later at worst just seems redundant. But if we have the position where decode is called twice, this will create problems for me. A doubly decoded value gets corrupted and becomes a diamond with a question mark in it, or some such value. Hope above assures you this won't happen. (We won't be double decoding.) Cheers Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Yep, good to know. It would be nice if WxPerl would announce somehow that a font doesn't have the necessary glifs (maybe with a warning). And btw, if the constant wxVSCROLL is not necessary, (do I understood correctly that it is not necessary under Linux either?) and if it creates problems under Windows, it would be helpful if it would give an error when it is used, or it would be helpful if it would be skipped at least under platforms that have a problem with it. --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: wxperl-users@perl.org Sent: Wednesday, May 01, 2013 8:18 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, Just a clarification. Setting the font so that you get glyphs displayed properly is only an issue on Windows XP. More recent versions of Windows have default GUI fonts that have a much wider range of glyphs available so this isn't an issue. Unless, strangely, you force 'Verdana' which seems to be the font Microsoft forgot. Regards Mark On 01/05/2013 17:35, Mark Dootson wrote: Hi, On 01/05/2013 15:21, Octavian Rasnita wrote: Thank you Mark! Now the chars are displayed fine. Too bad that I need to use Windows... BTW, what happends if this program runs under Linux or Mac? Will WxPerl use another sans-serif font available under these platforms? Yes - a similar font chosen on the basis of wxFONTFAMILY_SWISS, wxFONTSTYLE_NORMAL, wxFONTWEIGHT_NORMAL. For both Linux and MacOSX (modern editions I've just tested at least) if the font does not contain a glyph to represent a character, the font engine will check other fonts until it finds one that does. Regards Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, On 01/05/2013 20:14, Octavian Rasnita wrote: Yep, good to know. It would be nice if WxPerl would announce somehow that a font doesn't have the necessary glifs (maybe with a warning). Nice to have, but there is no reasonable and practical implementation I can think of. I am aware of how to check if a given font has a particular glyph technically - but it is non trivial. This also won't tell me if the operating system font handler will correctly substitute a glyph from elsewhere. And, it isn't part of wxWidgets. It's only a problem in decade old Windows XP anyway. Elsewhere if you just accept the default GUI font for text type controls all will be the best it can be. And btw, if the constant wxVSCROLL is not necessary, (do I understood correctly that it is not necessary under Linux either?) and if it creates problems under Windows, it would be helpful if it would give an error when it is used, or it would be helpful if it would be skipped at least under platforms that have a problem with it. Not quite that simple. It is really just a number that gets added to a flags mask. It isn't unique. The number might actually mean something. It just doesn't mean wxVSCROLL in this context. I suppose it might be possible to manually go through the wxWidgets code and check what the acceptable flags are for every class. But we'd have to re-do it for every wxWidgets release. For me, that is too much maintenance burden. Sorry I couldn't add these but hope you understand why. Regards Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, On 30/04/2013 15:38, Johan Vromans wrote: 2. This data is the current default format fro wxWidgets. Which I understand it may work if you're lucky. I am of the opinion that this bit ( WXSTRING_INPUT ) already works as well as it can do if given an SV and no other params. I certainly would not want to change it to force all input to be valid UTF-8. There is absolutely no reason to do so. Forcing valid UTF-8 will enhance both cases to will allways work. This sounds like a good reason to me. Only if the input actually is valid UTF-8. Something only the Perl coder can know / ensure. It isn't a requirement of the wxWidgets library. [ ... on upgrade ... ] I don't think there's a coding error just because I use utf8::upgrade($string). I don't need utf8::upgrade for my Perl code. I need it to allow me to tell wxWidgets what's in $string. I understand this to mean: always do a utf::upgrade *before* passing a string to wxWidgets. For me this is a signal that upgrading should take place on the Perl - wxWidgets boundary and not in the user program. No, it means do a utf::upgrade before passing a string that you know contains valid UTF-8 to ensure the scalar is marked as such. I appreciate your view may be different and you're entitled to think mine is wrong. My knowledge of the hairy details of Perl and Unicode is not sufficient to tell what/who is wrong or right. Hence my urge to understand. I don't believe there is an actual wrong or right. I know that forcing the input to be marked as utf8 at the Perl = wxWidgets boundary will break any existing code that passes buffers that aren't valid UTF-8. It makes assumptions about the data buffer passed that are not universally true. I think you are getting focused on the concept that 'decode' alone is enough and everything else should just work. I don't see this as the issue at all. I don't really care about the internals of decode and Perl's unicode handling. It doesn't matter here. I am simply saying that one can achieve consistent results when calling wxWidgets methods if one is explicit about the data format of the data buffers passed. If there is a buffer known to contain valid UTF-8, then a simple shortcut is utf8::upgrade($string); If I read your proposal correctly, you want to demand that all data buffers that may get passed to the wxString conversion function are valid UTF-8? Hope I'm not missing the point. Regards Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Mark Dootson mark.doot...@znix.com writes: Only if the input actually is valid UTF-8. Something only the Perl coder can know / ensure. It isn't a requirement of the wxWidgets library. We may assume that the Perl string is in Perl's internal encoding. So I think it would be safe to encode the string in UTF-8 (this is basically what utf8::upgrade does) and pass it as an UTF-8 string to wxWidgets. I am simply saying that one can achieve consistent results when calling wxWidgets methods if one is explicit about the data format of the data buffers passed. If there is a buffer known to contain valid UTF-8, then a simple shortcut is utf8::upgrade($string); AFAIK, when a buffer contains valid UTF-8 (e.g., as result of an earlier decode), utf8::upgrade is a no-op. As of Perl 5.14, use feature 'unicode_strings' will make sure that all strings are, indeed, UTF-8. This takes the burden off the programmer to call utf8::upgrade (and knowing when to call it). If I read your proposal correctly, you want to demand that all data buffers that may get passed to the wxString conversion function are valid UTF-8? Not really. They should be in Perl's internal coding, and thus can be safely and transparently upgraded to UTF-8. But in the end I think feature 'unicode_strings' will be the best and most elegant solution. -- Johan
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi Mark, I tried your suggestion and I removed the constant wxVSCROLL from the attributes of Wx::TextCtrl constructor, but the UTF-8 encoded chars still appear as squares. I use Windows XP Pro and ActivePerl 5.14.2. (I need to use ActivePerl and not another distribution because I need to create a COM server with this application.) Then I tried adding: utf8::upgrade( $text ); then $textfield-AppendText( $text ); But no difference. Those chars still appear as squares. Then I also added: use Encode; $text = decode('utf8', $text ); utf8::upgrade( $text ); But this time it gave the following error: Cannot decode string with wide characters at D:/usr/lib/Encode.pm line 176. The scalar $text is obtain from an SQLite database and I connect to it using: my $dbh = DBI-connect(dbi:SQLite:test.db); $dbh-do(PRAGMA cache_size = 8); $dbh-do(PRAGMA synchronous = OFF); $dbh-{sqlite_unicode} = 1; The new cod with the SQLite db is at: http://maestrodex.ro/static/test2.zip I selected the record from this DB in command line and I've seen that the special char ț appears as 2 chars, so I think the char is added well in DB. --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Monday, April 29, 2013 4:32 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, A Perl scalar has a character buffer to store character or byte data. This data can be interpreted and stored by Perl in one of two formats: 1. Perl's internal data format 2. A number octets (bytes) representing a UTF-8 encoded string. Internally it is just a memory buffer. Each scalar has a utf8 flag. This tells Perl internally how to interpret its data buffer. Either as Perl's internal data format or as UTF-8 encoded text. If the utf8 flag is on, Perl regards the buffer as UTF-8 encoded text. If the utf8 flag is off, Perl regards the buffer as containing data in Perl's internal format. So, say I load some binary data that I know is text encoded using 'ISO-8859-1'. Then I would do: my $string = decode('ISO-8859-1', $binary); This gets $string which contains data in Perl's internal format. The utf8 flag for the scalar '$string' is off As you have noted below, I can't pass '$binary' to any of Perl's string functions. The results will be unpredictable and mostly bad. The evil starts due to some special features when we use decode to convert a UTF-8 encoded string. my $string = decode('utf8', $binary); If $binary can be converted to $string using single byte characters, then $string will be in Perl's internal data format and marked as such. (utf8 flag off). If $binary contains multiple byte characters the $string will contain a series of bytes representing a UTF-8 encoded string and the scalar '$string' will have the utf8 flag on. Within Perl it should not matter whether the scalar is marked UTF-8 or not - so long as the utf8 flag correctly reflects what's in the scalar's data buffer. The problem comes when we come to pass the data to the wxWidgets library. The source macro that does this is: #define WXSTRING_INPUT( var, type, arg ) \ var = ( SvUTF8( arg ) ) ? \ wxString( SvPVutf8_nolen( arg ), wxConvUTF8 ) \ : wxString( SvPV_nolen( arg ), wxConvLibc ); So basically, if the scalar is marked as 'utf8' then it gets converted into a wxString as such. If not, you're at the mercy of libc and local system settings. It may work. It may not. Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); This should be platform independent and work - always. Perl's string functions should all work OK on $string. The key points my $string = decode('utf8', $binary); It depends on the content of $binary whether $string has the utf8 flag set. my $string = decode('utf8', $binary); utf8::upgrade( $string ); $string always has utf8 flag set. You could just do utf8::upgrade($binary) but that would be a special case for when $binary actually contains UTF-8 bytes. The two step method applies to any encoding. Perl can't know that a scalar contains UTF-8 encoded text unless you tell it. The statement: 'use utf8;' Is not needed anywhere here of course as it indicates that the source code is encoded in UTF-8. Nothing more. Functions utf8::upgrade etc. are always available. If you have a list of scalars containing strings as in @combo_options then the same applies - to each individual scalar / string in the list. Hope it helps. Mark On 29/04/2013 12:29, steveco.1...@gmail.com wrote: Hi Mark, I'm a relative new comer to utf8 so please take everything I say with a pinch of salt but your answer looks a bit qualified: if scalar, if marked. That implies if I want a Perl list (say @combo_options) for a Wx::ComboBox, then that won't work? Is that how it is? And I don't know what marked means. The real problem for me
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, Comment out the line $text = decode('utf8', $text ); you do not need it. Change the font name requested to 'Arial'. Everything should work. I'll try to figure out if there's a way to query a font to check if it has glyphs for particular code points. The old Font Encoding setting seems useless here. Regards Mark On 30/04/2013 20:29, Octavian Rasnita wrote: Hi Mark, I tried your suggestion and I removed the constant wxVSCROLL from the attributes of Wx::TextCtrl constructor, but the UTF-8 encoded chars still appear as squares. I use Windows XP Pro and ActivePerl 5.14.2. (I need to use ActivePerl and not another distribution because I need to create a COM server with this application.) Then I tried adding: utf8::upgrade( $text ); then $textfield-AppendText( $text ); But no difference. Those chars still appear as squares. Then I also added: use Encode; $text = decode('utf8', $text ); utf8::upgrade( $text ); But this time it gave the following error: Cannot decode string with wide characters at D:/usr/lib/Encode.pm line 176. The scalar $text is obtain from an SQLite database and I connect to it using: my $dbh = DBI-connect(dbi:SQLite:test.db); $dbh-do(PRAGMA cache_size = 8); $dbh-do(PRAGMA synchronous = OFF); $dbh-{sqlite_unicode} = 1; The new cod with the SQLite db is at: http://maestrodex.ro/static/test2.zip I selected the record from this DB in command line and I've seen that the special char ț appears as 2 chars, so I think the char is added well in DB. --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Monday, April 29, 2013 4:32 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, A Perl scalar has a character buffer to store character or byte data. This data can be interpreted and stored by Perl in one of two formats: 1. Perl's internal data format 2. A number octets (bytes) representing a UTF-8 encoded string. Internally it is just a memory buffer. Each scalar has a utf8 flag. This tells Perl internally how to interpret its data buffer. Either as Perl's internal data format or as UTF-8 encoded text. If the utf8 flag is on, Perl regards the buffer as UTF-8 encoded text. If the utf8 flag is off, Perl regards the buffer as containing data in Perl's internal format. So, say I load some binary data that I know is text encoded using 'ISO-8859-1'. Then I would do: my $string = decode('ISO-8859-1', $binary); This gets $string which contains data in Perl's internal format. The utf8 flag for the scalar '$string' is off As you have noted below, I can't pass '$binary' to any of Perl's string functions. The results will be unpredictable and mostly bad. The evil starts due to some special features when we use decode to convert a UTF-8 encoded string. my $string = decode('utf8', $binary); If $binary can be converted to $string using single byte characters, then $string will be in Perl's internal data format and marked as such. (utf8 flag off). If $binary contains multiple byte characters the $string will contain a series of bytes representing a UTF-8 encoded string and the scalar '$string' will have the utf8 flag on. Within Perl it should not matter whether the scalar is marked UTF-8 or not - so long as the utf8 flag correctly reflects what's in the scalar's data buffer. The problem comes when we come to pass the data to the wxWidgets library. The source macro that does this is: #define WXSTRING_INPUT( var, type, arg ) \ var = ( SvUTF8( arg ) ) ? \ wxString( SvPVutf8_nolen( arg ), wxConvUTF8 ) \ : wxString( SvPV_nolen( arg ), wxConvLibc ); So basically, if the scalar is marked as 'utf8' then it gets converted into a wxString as such. If not, you're at the mercy of libc and local system settings. It may work. It may not. Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); This should be platform independent and work - always. Perl's string functions should all work OK on $string. The key points my $string = decode('utf8', $binary); It depends on the content of $binary whether $string has the utf8 flag set. my $string = decode('utf8', $binary); utf8::upgrade( $string ); $string always has utf8 flag set. You could just do utf8::upgrade($binary) but that would be a special case for when $binary actually contains UTF-8 bytes. The two step method applies to any encoding. Perl can't know that a scalar contains UTF-8 encoded text unless you tell it. The statement: 'use utf8;' Is not needed anywhere here of course as it indicates that the source code is encoded in UTF-8. Nothing more. Functions utf8::upgrade etc. are always available. If you have a list of scalars containing strings as in @combo_options then the same applies - to each individual scalar / string in the list. Hope it helps. Mark On 29/04/2013 12:29, steveco.1...@gmail.com wrote: Hi Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
I have tried this but it still displays squares instead of UTF-8 chars. (I deleted the line with decode() and I set the font to Arial.) The new test file is at: http://maestrodex.ro/static/test3.zip --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: Octavian Rasnita orasn...@gmail.com Cc: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Tuesday, April 30, 2013 10:43 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, Comment out the line $text = decode('utf8', $text ); you do not need it. Change the font name requested to 'Arial'. Everything should work. I'll try to figure out if there's a way to query a font to check if it has glyphs for particular code points. The old Font Encoding setting seems useless here. Regards Mark On 30/04/2013 20:29, Octavian Rasnita wrote: Hi Mark, I tried your suggestion and I removed the constant wxVSCROLL from the attributes of Wx::TextCtrl constructor, but the UTF-8 encoded chars still appear as squares. I use Windows XP Pro and ActivePerl 5.14.2. (I need to use ActivePerl and not another distribution because I need to create a COM server with this application.) Then I tried adding: utf8::upgrade( $text ); then $textfield-AppendText( $text ); But no difference. Those chars still appear as squares. Then I also added: use Encode; $text = decode('utf8', $text ); utf8::upgrade( $text ); But this time it gave the following error: Cannot decode string with wide characters at D:/usr/lib/Encode.pm line 176. The scalar $text is obtain from an SQLite database and I connect to it using: my $dbh = DBI-connect(dbi:SQLite:test.db); $dbh-do(PRAGMA cache_size = 8); $dbh-do(PRAGMA synchronous = OFF); $dbh-{sqlite_unicode} = 1; The new cod with the SQLite db is at: http://maestrodex.ro/static/test2.zip I selected the record from this DB in command line and I've seen that the special char ț appears as 2 chars, so I think the char is added well in DB. --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Monday, April 29, 2013 4:32 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, A Perl scalar has a character buffer to store character or byte data. This data can be interpreted and stored by Perl in one of two formats: 1. Perl's internal data format 2. A number octets (bytes) representing a UTF-8 encoded string. Internally it is just a memory buffer. Each scalar has a utf8 flag. This tells Perl internally how to interpret its data buffer. Either as Perl's internal data format or as UTF-8 encoded text. If the utf8 flag is on, Perl regards the buffer as UTF-8 encoded text. If the utf8 flag is off, Perl regards the buffer as containing data in Perl's internal format. So, say I load some binary data that I know is text encoded using 'ISO-8859-1'. Then I would do: my $string = decode('ISO-8859-1', $binary); This gets $string which contains data in Perl's internal format. The utf8 flag for the scalar '$string' is off As you have noted below, I can't pass '$binary' to any of Perl's string functions. The results will be unpredictable and mostly bad. The evil starts due to some special features when we use decode to convert a UTF-8 encoded string. my $string = decode('utf8', $binary); If $binary can be converted to $string using single byte characters, then $string will be in Perl's internal data format and marked as such. (utf8 flag off). If $binary contains multiple byte characters the $string will contain a series of bytes representing a UTF-8 encoded string and the scalar '$string' will have the utf8 flag on. Within Perl it should not matter whether the scalar is marked UTF-8 or not - so long as the utf8 flag correctly reflects what's in the scalar's data buffer. The problem comes when we come to pass the data to the wxWidgets library. The source macro that does this is: #define WXSTRING_INPUT( var, type, arg ) \ var = ( SvUTF8( arg ) ) ? \ wxString( SvPVutf8_nolen( arg ), wxConvUTF8 ) \ : wxString( SvPV_nolen( arg ), wxConvLibc ); So basically, if the scalar is marked as 'utf8' then it gets converted into a wxString as such. If not, you're at the mercy of libc and local system settings. It may work. It may not. Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); This should be platform independent and work - always. Perl's string functions should all work OK on $string. The key points my $string = decode('utf8', $binary); It depends on the content of $binary whether $string has the utf8 flag set. my $string = decode('utf8', $binary); utf8::upgrade( $string ); $string always has utf8 flag set. You could just do utf8::upgrade($binary) but that would be a special case for when $binary actually contains UTF-8 bytes. The two step method applies to any encoding. Perl can't know
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
I would guess you are working on Windows? wxVSCROLL isn't in the list of styles available for wxTextCtrl. It isn't needed. Remove it and all works OK. It seems you can get away with it on Linux - but not on Windows. For the Wx::Font you can just do my $font = Wx::Font-new( $FontSize}, wxFONTFAMILY_SWISS, wxFONTSTYLE_NORMAL, wxFONTWEIGHT_NORMAL, 0, Arial); Cheers Mark On 22/04/2013 20:51, Octavian Rasnita wrote: Hi, I have a text field defined as: $self-{defs} = Wx::TextCtrl-new( $self-{panel}, -1, , wxDefaultPosition, [ 500, 400 ], wxTE_MULTILINE | wxTE_READONLY | wxVSCROLL | wxTE_PROCESS_ENTER | wxTE_RICH2 ); And I am trying to set a font for it using: my $font = Wx::Font-new( $FontSize}, wxFONTFAMILY_SWISS, wxFONTSTYLE_NORMAL, wxFONTWEIGHT_NORMAL, 0, Arial Unicode MS, wxFONTENCODING_SYSTEM ); my $style = Wx::TextAttr-new; $style-Wx::TextAttr::SetFont($font); $self-{defs}-SetDefaultStyle($style); But if I print UTF-8 chars in this field, it prints squares instead of special UTF-8 chars (non ASCII). Should it work this way and the text might be wrong UTF-8 encoded? I also read in the WxPerl documentation: The known font encodings are: ... wxFONTENCODING_UTF8,// UTF-8 Unicode encoding So I tried to change wxFONTENCODING_SYSTEM with wxFONTENCODING_UTF8 in the code above. But this just makes WxPerl to pop up a window that tells: Wx::SimpleApp: unknown encoding No font for displaying text in encoding 'Unicode 8 bit (UTF-8)' found.Would you like to select a font to be used for this encoding (otherwise the text in this encoding will not be shown correctly)? Yes No So I created another small program to find which are the available encodings: use Wx ':everything'; use Data::Dump 'pp'; my $enum = Wx::FontEnumerator-new; my @encodings = $enum-GetEncodings; print pp \@encodings; And the result was: [ WINDOWS-1250, WINDOWS-437, unknown-87, WINDOWS-1252, WINDOWS-1255, WINDOWS-1256, WINDOWS-1253, WINDOWS-1254, WINDOWS-1257, WINDOWS-1251, unknown--1, WINDOWS-874, unknown--1, WINDOWS-932, WINDOWS-949, unknown--1, WINDOWS-936, WINDOWS-950, ] So there is no encoding that contains UTF-8 in its name. Does this mean that WxPerl can't print UTF-8 texts? Or there is something else I need to do? In the same documentation page, when describing the constructor of the Wx::Font object, it describes the encoding parameter as: encoding An encoding which may be one of wxFONTENCODING_SYSTEM wxFONTENCODING_DEFAULT wxFONTENCODING_ISO8859_1...15 wxFONTENCODING_KOI8 wxFONTENCODING_CP1250...1252 In this list of possible encodings appear only these items, and no one that includes UTF8. Eventually, is there an example of writing text with a certain font that supports UTF-8? Thanks. --Octavian
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, If your Perl scalar contains UTF-8 encoded text and is marked as such, then you shouldn't need any decoding functions. (Well, that is how it is supposed to work. If it doesn't, it is a bug). So, yes - if your scalars contain UTF-8 encoded text and are marked as such, that's all you should need. Regards Mark On 29/04/2013 10:25, steveco.1...@gmail.com wrote: Hi Mark, Are you saying that you don't need some form of utf8 decode function in wxPerl? And that it should work perfectly without it? Regards Steve
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Mark Dootson mark.doot...@znix.com writes: I would guess you are working on Windows? wxVSCROLL isn't in the list of styles available for wxTextCtrl. It isn't needed. Remove it and all works OK. It seems you can get away with it on Linux - but not on Windows. Ah! I was misled by the #!/usr/bin/perl that made me assume it was Linux... -- Johan
RE: Can we print UTF-8 chars in Wx::TextCtrl fields?
Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); This should be platform independent and work - always. Perl's string functions should all work OK on $string. So you are saying that if I change $var = decode(utf8,$row-{ATT_BOOKING_COMMENT_TXT}) ; to $var = decode(utf8,$row-{ATT_BOOKING_COMMENT_TXT}) ; utf8::upgrade($var ); it will be more resilient and cater for more cases than just the decode option. And you also agree that I need to do this to use the Perl string functions? Regarding: 'use utf8;' Is not needed anywhere here of course as it indicates that the source code is encoded in UTF-8. Nothing more. Functions utf8::upgrade etc. are always available. Normally I use translate, so I don't need to 'use utf8;', but data structures are language dependent, eg address, it's not just zip-code versus post-code, but also the sequence of fields, sometimes the street number goes first, sometimes last, sometimes you validate 'state' sometimes not. And in these cases I started using imported XRC, but actually it was easier to just code it by language, hence, 'use utf8;' Thanks Mark, Regards Steve
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
steveco.1...@gmail.com writes: As it is at the moment, I just use decode and I don't get any errors. But I do need to use decode. Whenever you bring data from outside Perl into Perl, you should decode it. If the data is ASCII (actually: Latin-1) it doesn't matter much, but if the data is encoded in anything else you *must* decode it into Perl's internal encoding (which happens to be Latin-1 or UTF-8, but that should not matter). When bringing data to the outside, it *must* be encoded to what the outside expects. When displaying data through Wx widgets, this encoding is handled for you. But when you write the data to a file you must encode it explicitly. Databases can be a problem, since it depends on the particular database whether you get the data already decoded or not. In general, modern databases should deliver and accept data in Perl's internal encoding. -- Johan
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi Mark, Thank you for this great explanation. Much clearer than other documentations. --Octavian - Original Message - From: Mark Dootson mark.doot...@znix.com To: steveco.1...@gmail.com; wxperl-users@perl.org Sent: Monday, April 29, 2013 4:32 PM Subject: Re: Can we print UTF-8 chars in Wx::TextCtrl fields? Hi, A Perl scalar has a character buffer to store character or byte data. This data can be interpreted and stored by Perl in one of two formats: 1. Perl's internal data format 2. A number octets (bytes) representing a UTF-8 encoded string. Internally it is just a memory buffer. Each scalar has a utf8 flag. This tells Perl internally how to interpret its data buffer. Either as Perl's internal data format or as UTF-8 encoded text. If the utf8 flag is on, Perl regards the buffer as UTF-8 encoded text. If the utf8 flag is off, Perl regards the buffer as containing data in Perl's internal format. So, say I load some binary data that I know is text encoded using 'ISO-8859-1'. Then I would do: my $string = decode('ISO-8859-1', $binary); This gets $string which contains data in Perl's internal format. The utf8 flag for the scalar '$string' is off As you have noted below, I can't pass '$binary' to any of Perl's string functions. The results will be unpredictable and mostly bad. The evil starts due to some special features when we use decode to convert a UTF-8 encoded string. my $string = decode('utf8', $binary); If $binary can be converted to $string using single byte characters, then $string will be in Perl's internal data format and marked as such. (utf8 flag off). If $binary contains multiple byte characters the $string will contain a series of bytes representing a UTF-8 encoded string and the scalar '$string' will have the utf8 flag on. Within Perl it should not matter whether the scalar is marked UTF-8 or not - so long as the utf8 flag correctly reflects what's in the scalar's data buffer. The problem comes when we come to pass the data to the wxWidgets library. The source macro that does this is: #define WXSTRING_INPUT( var, type, arg ) \ var = ( SvUTF8( arg ) ) ? \ wxString( SvPVutf8_nolen( arg ), wxConvUTF8 ) \ : wxString( SvPV_nolen( arg ), wxConvLibc ); So basically, if the scalar is marked as 'utf8' then it gets converted into a wxString as such. If not, you're at the mercy of libc and local system settings. It may work. It may not. Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); This should be platform independent and work - always. Perl's string functions should all work OK on $string. The key points my $string = decode('utf8', $binary); It depends on the content of $binary whether $string has the utf8 flag set. my $string = decode('utf8', $binary); utf8::upgrade( $string ); $string always has utf8 flag set. You could just do utf8::upgrade($binary) but that would be a special case for when $binary actually contains UTF-8 bytes. The two step method applies to any encoding. Perl can't know that a scalar contains UTF-8 encoded text unless you tell it. The statement: 'use utf8;' Is not needed anywhere here of course as it indicates that the source code is encoded in UTF-8. Nothing more. Functions utf8::upgrade etc. are always available. If you have a list of scalars containing strings as in @combo_options then the same applies - to each individual scalar / string in the list. Hope it helps. Mark On 29/04/2013 12:29, steveco.1...@gmail.com wrote: Hi Mark, I'm a relative new comer to utf8 so please take everything I say with a pinch of salt but your answer looks a bit qualified: if scalar, if marked. That implies if I want a Perl list (say @combo_options) for a Wx::ComboBox, then that won't work? Is that how it is? And I don't know what marked means. The real problem for me is that this feels like the wrong place to decode. There are lots of things I might want to do with a string before I display it. I might want to sort it, or trim white space, or substitute a place-marker with a value. And for these I need it to be decoded before I process it. If I have a very simple app with no string processing, then this approach would be great but not otherwise. I did have a lot of issues with utf8 at the beginning sometimes I had display issues with a utf8 string and sometimes not. There seemed to be no particular rhyme or reason to it. And as you say, it works differently on Windows and Linux. Finally, everything is very sensitive to small errors, like having a non-existent style code. So I use a policy which is that when I read a value into my program from a database or a file, I always decode immediately. That way I know that all my variables are decoded and processable. Then I encode before I write back to the file or db. If I have an issue now, it is always where I have not done a: $var
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Mark Dootson mark.doot...@znix.com writes: #define WXSTRING_INPUT( var, type, arg ) \ var = ( SvUTF8( arg ) ) ? \ wxString( SvPVutf8_nolen( arg ), wxConvUTF8 ) \ : wxString( SvPV_nolen( arg ), wxConvLibc ); So basically, if the scalar is marked as 'utf8' then it gets converted into a wxString as such. If not, you're at the mercy of libc and local system settings. It may work. It may not. Solution - your conversion of external data should be my $string = decode($encoding, $binary); utf8::upgrade($string); I'd say this is the wrong approach. The solution is to adjust the WXSTRING_PUT macro to check for the utf8 flag and handle accordingly. -- Johan
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, On 29/04/2013 20:06, Johan Vromans wrote: I'd say this is the wrong approach. The solution is to adjust the WXSTRING_PUT macro to check for the utf8 flag and handle accordingly. -- Johan That's exactly what it does, unless I've misunderstood. Regards Mark
Re: Can we print UTF-8 chars in Wx::TextCtrl fields?
Mark Dootson mark.doot...@znix.com writes: Hi, On 29/04/2013 20:06, Johan Vromans wrote: I'd say this is the wrong approach. The solution is to adjust the WXSTRING_PUT macro to check for the utf8 flag and handle accordingly. That's exactly what it does, unless I've misunderstood. If it did, the explicit utf8::upgrade would not be necessary. I'm not an XS expert, so I asked some of my friends who are. They suggested to use SvPVutf8_force in WXSTRING_PUT. Does that sound sensible? -- Johan
Can we print UTF-8 chars in Wx::TextCtrl fields?
Hi, I have a text field defined as: $self-{defs} = Wx::TextCtrl-new( $self-{panel}, -1, , wxDefaultPosition, [ 500, 400 ], wxTE_MULTILINE | wxTE_READONLY | wxVSCROLL | wxTE_PROCESS_ENTER | wxTE_RICH2 ); And I am trying to set a font for it using: my $font = Wx::Font-new( $FontSize}, wxFONTFAMILY_SWISS, wxFONTSTYLE_NORMAL, wxFONTWEIGHT_NORMAL, 0, Arial Unicode MS, wxFONTENCODING_SYSTEM ); my $style = Wx::TextAttr-new; $style-Wx::TextAttr::SetFont($font); $self-{defs}-SetDefaultStyle($style); But if I print UTF-8 chars in this field, it prints squares instead of special UTF-8 chars (non ASCII). Should it work this way and the text might be wrong UTF-8 encoded? I also read in the WxPerl documentation: The known font encodings are: ... wxFONTENCODING_UTF8,// UTF-8 Unicode encoding So I tried to change wxFONTENCODING_SYSTEM with wxFONTENCODING_UTF8 in the code above. But this just makes WxPerl to pop up a window that tells: Wx::SimpleApp: unknown encoding No font for displaying text in encoding 'Unicode 8 bit (UTF-8)' found.Would you like to select a font to be used for this encoding (otherwise the text in this encoding will not be shown correctly)? Yes No So I created another small program to find which are the available encodings: use Wx ':everything'; use Data::Dump 'pp'; my $enum = Wx::FontEnumerator-new; my @encodings = $enum-GetEncodings; print pp \@encodings; And the result was: [ WINDOWS-1250, WINDOWS-437, unknown-87, WINDOWS-1252, WINDOWS-1255, WINDOWS-1256, WINDOWS-1253, WINDOWS-1254, WINDOWS-1257, WINDOWS-1251, unknown--1, WINDOWS-874, unknown--1, WINDOWS-932, WINDOWS-949, unknown--1, WINDOWS-936, WINDOWS-950, ] So there is no encoding that contains UTF-8 in its name. Does this mean that WxPerl can't print UTF-8 texts? Or there is something else I need to do? In the same documentation page, when describing the constructor of the Wx::Font object, it describes the encoding parameter as: encoding An encoding which may be one of wxFONTENCODING_SYSTEM wxFONTENCODING_DEFAULT wxFONTENCODING_ISO8859_1...15 wxFONTENCODING_KOI8 wxFONTENCODING_CP1250...1252 In this list of possible encodings appear only these items, and no one that includes UTF8. Eventually, is there an example of writing text with a certain font that supports UTF-8? Thanks. --Octavian