[EMAIL PROTECTED] wrote:
>
> I read in the docs that Unicode was limited in perl. I would like to search
> for and replace a Unicode (2 byte) string in a file. If I want to simply
> print out each line in a Unicode file the following works just fine.
>
> $filename = "file.unicode";
> open(FILE, $filename) or die "Can't open '$filename': $!";
> while(<FILE>) {
> print $_;
> }
> close FILE;
>
> Yet if I try to match a particular string the string is not matched.
>
> $filename = "file.unicode";
> open(FILE, $filename) or die "Can't open '$filename': $!";
> while(<FILE>) {
> if(/Testing/) {
> print $_;
> }
> }
> close FILE;
>
> I know that "Testing" is in the file (of course there are two bytes per
> character) but it seems that Perl does not find it or does not properly
> convert the characters. Any suggestions as to how I might proceed?
The Unicode::String module provides for mapping between unicode and
non-unicode
sets.
As far as regular expressions go, if you spell them right, they can work
on
unicode.
For instance, if you're looking for a constant UTF-8 string, you can do
this:
my $string = "Testing";
$string =~ s/(.)/$1\x00/g; # expand ANSI to UTF-8 (not the most
efficient, but...)
while (<FILE>)
{
if (/$string/o)
{
print;
}
}
Perl 5.6 supports UTF-8 directly (even in regular expressions, as I
understand).
Maybe you should look at the perldelta documentation for this version.
--
Ned Konz
currently: Stanwood, WA
email: [EMAIL PROTECTED]
homepage: http://www.bike-nomad.com
---
You are currently subscribed to perl-win32-users as: [archive@jab.org]
To unsubscribe, forward this message to
[EMAIL PROTECTED]
For non-automated Mailing List support, send email to
[EMAIL PROTECTED]