GetSelText loses Unicode

Robert May Sun, 17 Jul 2005 11:09:35 -0700

Glenn Linderman wrote:

Hi Rob,
I've been trying for a year to get some form of Unicode support intoWin32::GUi, and you did it in an afternoon! (it seems)

I was aware that you had been looking at this, but hadn't realised thescope: I though that you were looking for a 'full' unicode build.

Congratulations, you must know a bunch more about the Windows APIthan I ever did (or ever want to, but gradually I'm being sucked in).

Probably not, but I'm a good reader, and have plenty of experience ofinterpreting other peoples documents. I think you need a particularmindset to get anywhere with the MDSN documentation :-).

My failed attempts were firstly to compile with __UNICODE__ (or somesuch #define)


-DUNICODE    IIRC

turned on, which did nothing... my second attempt showed more promise:to define a Window class using a Unicode class name, which seems to bethe "official" way of getting controls to use Unicode within such Windows.

Do you have a reference to this 'official' way of doing it, as I havenot seen this.

This second attempt ran into the problem that UCS2 Unicode (used byWindows) has all those interspersed NUL characters, and Win32::GUI haslots of code that infers lengths by looking for the single-byte NULcharacter...

Right. You can't just compile with -DUNICODE and expect it all to work,unless it was written right to start with. If we want to be able to dothis, then we'll need to trawl the code and replace all type definitionsfor strings and characters with TCHAR and LPTSTR definitions (etc.).Also we'll need to do a trawl and correct any pointer arithmetic forthat it can work in both ANSI and UNICODE worlds.

and then Perl Unicode support being UTF8, the single-byte NULcharacter still "works", although it is inefficient to keeprecalculating lengths... and then I haven't yet found an XS-callableAPI to convert from UCS2 to UTF8...

There are some helper functions/macros in ...Perl/Core/Win32.h, but Ihaven't looked at how they would get used yet. They wrap the win32apiWideCharToMultiByte and MultiByteToWideChar functions.

I suppose one could call back to Perl's Encode module, but that seemsinefficient to do in as many places as one would have to do it...

I believe that there are specific perlapi functions to do this. I thinkit was perldoc perlapi where I was looking at this.

For my needs, I can keep most of the interfaces in English. I onlyneed to input/output Unicode text via (1) the filesystem (2) a textstring editing window (Textfield or RichEdit). So the code you postedmight enable everything I _need_...


(1) is handled by perlIO, and is quite well documented.

(2) I've attached below what I hope is enough for you (and anyone else)to achieve this with a Richedit control.

I'll be playing with the code you posted, and hoping that you canfollow through with somewhat more official support in Win32::GUI.I'll be glad to help as I can with testing, but -- I'll email youoff-list to discuss this more.

I'm afraid that Unicode support isn't high on my list of priorities, butI've added it to the list. I think full support in Win32::GUI will besignificant work, as it wasn't written for this in the first place. Ihope to be proven wrong.

I'll open this tracker in response to the email below. Perhaps itshould be generalized to other controls containing text, such asListBox, ComboBox, etc., if that is possible. I've alreadygeneralized it to Textfield ... thinking that it should be about thesame support for Textfield and RichEdit, and maybe even easier forTextfield... and Textfield doesn't have the extra formatting baggagethat RichEdit does, so could be simpler to use for simple text entrywork.

Sadly, I think that doing this with anything other that RichEdit willrequire the changes I mention above. With RichEdit, we're fortunate tohave an API that allows us to get the contents in whatever codepage wewant, but the same is not true for the other controls.


Have a play with this.

Regards,
Rob.

#! perl -w
use strict;
use warnings;

use Win32::GUI;

my $w = Win32::GUI::Window->new(
        -name  => 'Main',
        -size  => [800, 600],
);

my $re = $w->AddRichEdit(
        -size  => [600, 400],
);

$w->AddButton(
        -name  => 'btnChange',
        -pos   => [600, 500],
        -text  => '$re->TextEx($re->TextEx())',
);

# Unicode representations:
# U+0041 LATIN CAPITAL LETTER
# U+00ED LATIN SMALL LETTER I WITH ACUTE
# U+0131 LATIN SMALL LETTER DOTLESS I
# U+263A WHITE SMILING FACE # NOTE: smiling face displayed as box, due to font 
not having this codepoint
my $init_text = "A: \x{0041} These, \x{00ED}, \x{0131}, are not exactly the letter 
i. \x{263A}";
display_unicode_string($init_text);

$re->TextEx($init_text);

$w->Show();
Win32::GUI::Dialog();
exit(0);

sub btnChange_Click {
        $re->TextEx(my $text = $re->TextEx());
        display_unicode_string($text);

        return 1;
}

# Print a perl string containing wide characters to a console that doesn't
# support such wide characters, escaping the wide characters as \x{....}
# and reporting the length of the string in *characters*
sub display_unicode_string
{
        print ">|" . nice_string($_[0]) . "|<(" . length($_[0]) . ")\n";
}

sub nice_string   # modified from perldoc uniintro
{
        join("",
                map { $_ > 255 ?              # if wide character...
                        sprintf("\\x{%04X}", $_) :  # \x{...}
                        chr($_)                     # else as themselves
                } unpack("U*", $_[0]));       # unpack Unicode characters
}

package Win32::GUI::RichEdit;

use strict;
use warnings;
use Encode ();

# constants:
sub WM_USER()            {1024};
sub EM_GETTEXTEX()       {+WM_USER+94};
sub EM_GETTEXTLENGHTEX() {+WM_USER+95};
sub EM_SETTEXTEX()       {+WM_USER+97};
sub GT_DEFAULT()         {0};
sub GTL_PRECISE()        {2};
sub GTL_NUMBYTES()       {16};
sub ST_DEFAULT()         {0};
sub CP_UTF8()            {65001};

sub TextEx
{
        my $self = shift;
        my $text = shift;
        my $retval;

        if($text) {
                # set the text
                my $bytes = Encode::encode("UTF-8", $text);

                my $struct  = pack("LI", ST_DEFAULT, CP_UTF8);
                my $wparam  = unpack("L", pack("P8", $struct));

                my $retval = $self->SendMessage(EM_SETTEXTEX, $wparam, $bytes);
                warn("EM_SETTEXTEX failed") if $retval == 0;
        }
        else {
                # read the text

                # find out how big a buffer we need:
                my $struct  = pack("LI", GTL_NUMBYTES|GTL_PRECISE, CP_UTF8);
                my $wparam  = unpack("L", pack("P8", $struct));

                my $bufsz = $self->SendMessage(EM_GETTEXTLENGHTEX, $wparam, 0);

                # allocate a buffer
                my $bytes = " " x $bufsz;

                my $struct2  = pack("LLIpp", $bufsz, GT_DEFAULT, CP_UTF8, 
undef, undef);
                my $wparam2  = unpack("L", pack("P20", $struct2));

                my $numTchar = $re->SendMessage(EM_GETTEXTEX, $wparam2, $bytes);
                # TODO, probably should check the number of bytes returned, but 
$numTchar
                # is zero, unless the buffer is extended by one more byte, when 
it's value
                # is one less than bufsz?  What I've done seems to work.
                # This is contrary to what MSDN says about EM_GETTEXTEX return 
value.

#warn("Problem with EM_GETTEXTEX (got $numTchar bytes, expecting $bufsz)") if $numTchar != $bufsz;

                $retval = Encode::decode("UTF-8", $bytes);
        }

        return $retval;
}

__END__

Re: [perl-win32-gui-users] RichEdit->Text or ->GetSelText loses Unicode

Reply via email to