Hi list, I am running perl 5.6.1 on a redhat box, and I have come across this wierd (bug|feature|annoying thing). If this problem has been raised before please give me a reference to the "F" manual :-)
TEST SCRIPT: ============ use strict; use utf8; main(); sub main { # \x{A9} is the copyright string # my $data = "Copyright \x{A9} 2001-2002 MKDoc Ltd"; my $dlm = '(?:\p{IsSpace}|\p{IsPunct})'; my $re = 'MKDoc'; print "BEFORE: $data\n"; my @split = $data =~ /^(.*?$dlm)($re)($dlm.*?)$/ism; $data = join '', @split; print "AFTER : $data\n"; } 1; And here is what I get [jhiver@frogette mkdoc]$ perl -w test2.pl BEFORE: Copyright © 2001-2002 MKDoc Ltd AFTER : Copyright © 2001-2002 MKDoc Ltd [jhiver@frogette mkdoc]$ My terminal doesn't support UTF-8, which in this case is good because I an see all the caracters... surprise, using regexes capture seems to remove string utf8ness although the string IS utf8 and 'use utf8' is there... Any ideas? Cheers, -- IT'S TIME FOR A DIFFERENT KIND OF WEB ================================================================ Jean-Michel Hiver - Software Director [EMAIL PROTECTED] +44 (0)114 221 4968 ================================================================ VISIT HTTP://WWW.MKDOC.COM