Re: Finding and replacing a UNICODE string.

Ned Konz Fri, 05 May 2000 22:28:26 -0700
[EMAIL PROTECTED] wrote:
> 
> I read in the docs that Unicode was limited in perl. I would like to search
> for and replace a Unicode (2 byte) string in a file. If I want to simply
> print out each line in a Unicode file the following works just fine.
> 
> $filename = "file.unicode";
> open(FILE, $filename) or die "Can't open '$filename': $!";
> while(<FILE>) {
>         print $_;
> }
> close FILE;
> 
> Yet if I try to match a particular string the string is not matched.
> 
> $filename = "file.unicode";
> open(FILE, $filename) or die "Can't open '$filename': $!";
> while(<FILE>) {
>         if(/Testing/) {
>                 print $_;
>         }
> }
> close FILE;
> 
> I know that "Testing" is in the file (of course there are two bytes per
> character) but it seems that Perl does not find it or does not properly
> convert the characters. Any suggestions as to how I might proceed?

The Unicode::String module provides for mapping between unicode and
non-unicode
sets.

As far as regular expressions go, if you spell them right, they can work
on
unicode.

For instance, if you're looking for a constant UTF-8 string, you can do
this:

my $string = "Testing";
$string =~ s/(.)/$1\x00/g;      # expand ANSI to UTF-8 (not the most
efficient, but...)
while (<FILE>)
{
        if (/$string/o)
        {
                print;
        }
}

Perl 5.6 supports UTF-8 directly (even in regular expressions, as I
understand).

Maybe you should look at the perldelta documentation for this version.


-- 
Ned Konz
currently: Stanwood, WA
email:     [EMAIL PROTECTED]
homepage:  http://www.bike-nomad.com

---
You are currently subscribed to perl-win32-users as: [archive@jab.org]
To unsubscribe, forward this message to
         [EMAIL PROTECTED]
For non-automated Mailing List support, send email to  
         [EMAIL PROTECTED]
Re: Finding and replacing a UNICODE string.

Reply via email to