Glenn Linderman wrote:

Hi Rob,

I've been trying for a year to get some form of Unicode support into Win32::GUi, and you did it in an afternoon! (it seems)

I was aware that you had been looking at this, but hadn't realised the scope: I though that you were looking for a 'full' unicode build.

Congratulations, you must know a bunch more about the Windows API than I ever did (or ever want to, but gradually I'm being sucked in).

Probably not, but I'm a good reader, and have plenty of experience of interpreting other peoples documents. I think you need a particular mindset to get anywhere with the MDSN documentation :-).

My failed attempts were firstly to compile with __UNICODE__ (or some such #define)

-DUNICODE    IIRC

turned on, which did nothing... my second attempt showed more promise: to define a Window class using a Unicode class name, which seems to be the "official" way of getting controls to use Unicode within such Windows.

Do you have a reference to this 'official' way of doing it, as I have not seen this.

This second attempt ran into the problem that UCS2 Unicode (used by Windows) has all those interspersed NUL characters, and Win32::GUI has lots of code that infers lengths by looking for the single-byte NUL character...

Right. You can't just compile with -DUNICODE and expect it all to work, unless it was written right to start with. If we want to be able to do this, then we'll need to trawl the code and replace all type definitions for strings and characters with TCHAR and LPTSTR definitions (etc.). Also we'll need to do a trawl and correct any pointer arithmetic for that it can work in both ANSI and UNICODE worlds.

and then Perl Unicode support being UTF8, the single-byte NUL character still "works", although it is inefficient to keep recalculating lengths... and then I haven't yet found an XS-callable API to convert from UCS2 to UTF8...

There are some helper functions/macros in ...Perl/Core/Win32.h, but I haven't looked at how they would get used yet. They wrap the win32api WideCharToMultiByte and MultiByteToWideChar functions.

I suppose one could call back to Perl's Encode module, but that seems inefficient to do in as many places as one would have to do it...

I believe that there are specific perlapi functions to do this. I think it was perldoc perlapi where I was looking at this.

For my needs, I can keep most of the interfaces in English. I only need to input/output Unicode text via (1) the filesystem (2) a text string editing window (Textfield or RichEdit). So the code you posted might enable everything I _need_...

(1) is handled by perlIO, and is quite well documented.
(2) I've attached below what I hope is enough for you (and anyone else) to achieve this with a Richedit control.

I'll be playing with the code you posted, and hoping that you can follow through with somewhat more official support in Win32::GUI. I'll be glad to help as I can with testing, but -- I'll email you off-list to discuss this more.

I'm afraid that Unicode support isn't high on my list of priorities, but I've added it to the list. I think full support in Win32::GUI will be significant work, as it wasn't written for this in the first place. I hope to be proven wrong.

I'll open this tracker in response to the email below. Perhaps it should be generalized to other controls containing text, such as ListBox, ComboBox, etc., if that is possible. I've already generalized it to Textfield ... thinking that it should be about the same support for Textfield and RichEdit, and maybe even easier for Textfield... and Textfield doesn't have the extra formatting baggage that RichEdit does, so could be simpler to use for simple text entry work.

Sadly, I think that doing this with anything other that RichEdit will require the changes I mention above. With RichEdit, we're fortunate to have an API that allows us to get the contents in whatever codepage we want, but the same is not true for the other controls.

Have a play with this.

Regards,
Rob.

#! perl -w
use strict;
use warnings;

use Win32::GUI;

my $w = Win32::GUI::Window->new(
        -name  => 'Main',
        -size  => [800, 600],
);

my $re = $w->AddRichEdit(
        -size  => [600, 400],
);

$w->AddButton(
        -name  => 'btnChange',
        -pos   => [600, 500],
        -text  => '$re->TextEx($re->TextEx())',
);

# Unicode representations:
# U+0041 LATIN CAPITAL LETTER
# U+00ED LATIN SMALL LETTER I WITH ACUTE
# U+0131 LATIN SMALL LETTER DOTLESS I
# U+263A WHITE SMILING FACE # NOTE: smiling face displayed as box, due to font 
not having this codepoint
my $init_text = "A: \x{0041} These, \x{00ED}, \x{0131}, are not exactly the letter 
i. \x{263A}";
display_unicode_string($init_text);

$re->TextEx($init_text);

$w->Show();
Win32::GUI::Dialog();
exit(0);

sub btnChange_Click {
        $re->TextEx(my $text = $re->TextEx());
        display_unicode_string($text);

        return 1;
}

# Print a perl string containing wide characters to a console that doesn't
# support such wide characters, escaping the wide characters as \x{....}
# and reporting the length of the string in *characters*
sub display_unicode_string
{
        print ">|" . nice_string($_[0]) . "|<(" . length($_[0]) . ")\n";
}

sub nice_string   # modified from perldoc uniintro
{
        join("",
                map { $_ > 255 ?              # if wide character...
                        sprintf("\\x{%04X}", $_) :  # \x{...}
                        chr($_)                     # else as themselves
                } unpack("U*", $_[0]));       # unpack Unicode characters
}

package Win32::GUI::RichEdit;

use strict;
use warnings;
use Encode ();

# constants:
sub WM_USER()            {1024};
sub EM_GETTEXTEX()       {+WM_USER+94};
sub EM_GETTEXTLENGHTEX() {+WM_USER+95};
sub EM_SETTEXTEX()       {+WM_USER+97};
sub GT_DEFAULT()         {0};
sub GTL_PRECISE()        {2};
sub GTL_NUMBYTES()       {16};
sub ST_DEFAULT()         {0};
sub CP_UTF8()            {65001};

sub TextEx
{
        my $self = shift;
        my $text = shift;
        my $retval;

        if($text) {
                # set the text
                my $bytes = Encode::encode("UTF-8", $text);

                my $struct  = pack("LI", ST_DEFAULT, CP_UTF8);
                my $wparam  = unpack("L", pack("P8", $struct));

                my $retval = $self->SendMessage(EM_SETTEXTEX, $wparam, $bytes);
                warn("EM_SETTEXTEX failed") if $retval == 0;
        }
        else {
                # read the text

                # find out how big a buffer we need:
                my $struct  = pack("LI", GTL_NUMBYTES|GTL_PRECISE, CP_UTF8);
                my $wparam  = unpack("L", pack("P8", $struct));

                my $bufsz = $self->SendMessage(EM_GETTEXTLENGHTEX, $wparam, 0);

                # allocate a buffer
                my $bytes = " " x $bufsz;

                my $struct2  = pack("LLIpp", $bufsz, GT_DEFAULT, CP_UTF8, 
undef, undef);
                my $wparam2  = unpack("L", pack("P20", $struct2));

                my $numTchar = $re->SendMessage(EM_GETTEXTEX, $wparam2, $bytes);
                # TODO, probably should check the number of bytes returned, but 
$numTchar
                # is zero, unless the buffer is extended by one more byte, when 
it's value
                # is one less than bufsz?  What I've done seems to work.
                # This is contrary to what MSDN says about EM_GETTEXTEX return 
value.
#warn("Problem with EM_GETTEXTEX (got $numTchar bytes, expecting $bufsz)") if $numTchar != $bufsz;
                $retval = Encode::decode("UTF-8", $bytes);
        }

        return $retval;
}

__END__



Reply via email to