Glenn Linderman wrote:
Hi Rob,
I've been trying for a year to get some form of Unicode support into
Win32::GUi, and you did it in an afternoon! (it seems)
I was aware that you had been looking at this, but hadn't realised the
scope: I though that you were looking for a 'full' unicode build.
Congratulations, you must know a bunch more about the Windows API
than I ever did (or ever want to, but gradually I'm being sucked in).
Probably not, but I'm a good reader, and have plenty of experience of
interpreting other peoples documents. I think you need a particular
mindset to get anywhere with the MDSN documentation :-).
My failed attempts were firstly to compile with __UNICODE__ (or some
such #define)
-DUNICODE IIRC
turned on, which did nothing... my second attempt showed more promise:
to define a Window class using a Unicode class name, which seems to be
the "official" way of getting controls to use Unicode within such Windows.
Do you have a reference to this 'official' way of doing it, as I have
not seen this.
This second attempt ran into the problem that UCS2 Unicode (used by
Windows) has all those interspersed NUL characters, and Win32::GUI has
lots of code that infers lengths by looking for the single-byte NUL
character...
Right. You can't just compile with -DUNICODE and expect it all to work,
unless it was written right to start with. If we want to be able to do
this, then we'll need to trawl the code and replace all type definitions
for strings and characters with TCHAR and LPTSTR definitions (etc.).
Also we'll need to do a trawl and correct any pointer arithmetic for
that it can work in both ANSI and UNICODE worlds.
and then Perl Unicode support being UTF8, the single-byte NUL
character still "works", although it is inefficient to keep
recalculating lengths... and then I haven't yet found an XS-callable
API to convert from UCS2 to UTF8...
There are some helper functions/macros in ...Perl/Core/Win32.h, but I
haven't looked at how they would get used yet. They wrap the win32api
WideCharToMultiByte and MultiByteToWideChar functions.
I suppose one could call back to Perl's Encode module, but that seems
inefficient to do in as many places as one would have to do it...
I believe that there are specific perlapi functions to do this. I think
it was perldoc perlapi where I was looking at this.
For my needs, I can keep most of the interfaces in English. I only
need to input/output Unicode text via (1) the filesystem (2) a text
string editing window (Textfield or RichEdit). So the code you posted
might enable everything I _need_...
(1) is handled by perlIO, and is quite well documented.
(2) I've attached below what I hope is enough for you (and anyone else)
to achieve this with a Richedit control.
I'll be playing with the code you posted, and hoping that you can
follow through with somewhat more official support in Win32::GUI.
I'll be glad to help as I can with testing, but -- I'll email you
off-list to discuss this more.
I'm afraid that Unicode support isn't high on my list of priorities, but
I've added it to the list. I think full support in Win32::GUI will be
significant work, as it wasn't written for this in the first place. I
hope to be proven wrong.
I'll open this tracker in response to the email below. Perhaps it
should be generalized to other controls containing text, such as
ListBox, ComboBox, etc., if that is possible. I've already
generalized it to Textfield ... thinking that it should be about the
same support for Textfield and RichEdit, and maybe even easier for
Textfield... and Textfield doesn't have the extra formatting baggage
that RichEdit does, so could be simpler to use for simple text entry
work.
Sadly, I think that doing this with anything other that RichEdit will
require the changes I mention above. With RichEdit, we're fortunate to
have an API that allows us to get the contents in whatever codepage we
want, but the same is not true for the other controls.
Have a play with this.
Regards,
Rob.
#! perl -w
use strict;
use warnings;
use Win32::GUI;
my $w = Win32::GUI::Window->new(
-name => 'Main',
-size => [800, 600],
);
my $re = $w->AddRichEdit(
-size => [600, 400],
);
$w->AddButton(
-name => 'btnChange',
-pos => [600, 500],
-text => '$re->TextEx($re->TextEx())',
);
# Unicode representations:
# U+0041 LATIN CAPITAL LETTER
# U+00ED LATIN SMALL LETTER I WITH ACUTE
# U+0131 LATIN SMALL LETTER DOTLESS I
# U+263A WHITE SMILING FACE # NOTE: smiling face displayed as box, due to font
not having this codepoint
my $init_text = "A: \x{0041} These, \x{00ED}, \x{0131}, are not exactly the letter
i. \x{263A}";
display_unicode_string($init_text);
$re->TextEx($init_text);
$w->Show();
Win32::GUI::Dialog();
exit(0);
sub btnChange_Click {
$re->TextEx(my $text = $re->TextEx());
display_unicode_string($text);
return 1;
}
# Print a perl string containing wide characters to a console that doesn't
# support such wide characters, escaping the wide characters as \x{....}
# and reporting the length of the string in *characters*
sub display_unicode_string
{
print ">|" . nice_string($_[0]) . "|<(" . length($_[0]) . ")\n";
}
sub nice_string # modified from perldoc uniintro
{
join("",
map { $_ > 255 ? # if wide character...
sprintf("\\x{%04X}", $_) : # \x{...}
chr($_) # else as themselves
} unpack("U*", $_[0])); # unpack Unicode characters
}
package Win32::GUI::RichEdit;
use strict;
use warnings;
use Encode ();
# constants:
sub WM_USER() {1024};
sub EM_GETTEXTEX() {+WM_USER+94};
sub EM_GETTEXTLENGHTEX() {+WM_USER+95};
sub EM_SETTEXTEX() {+WM_USER+97};
sub GT_DEFAULT() {0};
sub GTL_PRECISE() {2};
sub GTL_NUMBYTES() {16};
sub ST_DEFAULT() {0};
sub CP_UTF8() {65001};
sub TextEx
{
my $self = shift;
my $text = shift;
my $retval;
if($text) {
# set the text
my $bytes = Encode::encode("UTF-8", $text);
my $struct = pack("LI", ST_DEFAULT, CP_UTF8);
my $wparam = unpack("L", pack("P8", $struct));
my $retval = $self->SendMessage(EM_SETTEXTEX, $wparam, $bytes);
warn("EM_SETTEXTEX failed") if $retval == 0;
}
else {
# read the text
# find out how big a buffer we need:
my $struct = pack("LI", GTL_NUMBYTES|GTL_PRECISE, CP_UTF8);
my $wparam = unpack("L", pack("P8", $struct));
my $bufsz = $self->SendMessage(EM_GETTEXTLENGHTEX, $wparam, 0);
# allocate a buffer
my $bytes = " " x $bufsz;
my $struct2 = pack("LLIpp", $bufsz, GT_DEFAULT, CP_UTF8,
undef, undef);
my $wparam2 = unpack("L", pack("P20", $struct2));
my $numTchar = $re->SendMessage(EM_GETTEXTEX, $wparam2, $bytes);
# TODO, probably should check the number of bytes returned, but
$numTchar
# is zero, unless the buffer is extended by one more byte, when
it's value
# is one less than bufsz? What I've done seems to work.
# This is contrary to what MSDN says about EM_GETTEXTEX return
value.
#warn("Problem with EM_GETTEXTEX (got $numTchar bytes, expecting $bufsz)") if $numTchar != $bufsz;
$retval = Encode::decode("UTF-8", $bytes);
}
return $retval;
}
__END__