Re: [Israel.pm] \w for utf8

Yuval Kogman Mon, 20 Aug 2007 05:45:39 -0700

use utf8;

Will tell perl that the current file is encoded in utf8 and all
strings will be assumed to be that (as opposed to latin1).

Since your string is likely coming from elsewhere, look into
binmode($fh, ":utf8) and open($fh, "<:utf8", $file), and also
Encode::decode.

These are the common methods to get a string to be marked as unicode
in memory, at which point the regex engine treats \w+ as really all
alphanumerical characters, not only [a-zA-Z0-9_].

There is a tutorial by Juerd somewhere, it's supposed to be pretty
good. Try google perhaps

On Mon, Aug 20, 2007 at 15:39:58 +0300, Pinkhas Nisanov wrote:
> Hi,
> 
> I need catch string that may include 'utf8' characters:
> e.g.:
> 
>   my $str_utf8 = 'N-Größe';
>   my @res = ( $str_utf8 =~ /(\w+)/g );
>   print join( " ++ ", @res ), "\n";
> 
> 
> it prints:
> 
>  N ++ Gr ++ e
> 
> but I need:
> 
> N ++ Größe
> 
> 
> thanks
> Pinkhas Nisanov
> _______________________________________________
> Perl mailing list
> [email protected]
> http://perl.org.il/mailman/listinfo/perl

-- 
  Yuval Kogman <[EMAIL PROTECTED]>
http://nothingmuch.woobling.org  0xEBD27418

_______________________________________________
Perl mailing list
[email protected]
http://perl.org.il/mailman/listinfo/perl

Re: [Israel.pm] \w for utf8

Reply via email to