On Thu, Feb 28, 2002 at 03:58:33PM +0000, Jean-Michel Hiver wrote: > Hi again, > > Sorry to send so many messages, but one of my colleagues told me that > the sample script I've sent wasn't clear enough. So here is my problem > stripped down as much as I can: > > [jhiver@frogette mkdoc]$ cat test2.pl > use strict; > use utf8; > > my $data = "Copyright \x{A9} 2001-2002 MKDoc Ltd"; > print $data, "\n"; > print $data =~ /(.*)/, "\n"; > > > [jhiver@frogette mkdoc]$ perl test2.pl > Copyright © 2001-2002 MKDoc Ltd > Copyright © 2001-2002 MKDoc Ltd > > > As you can see, the string has been converted from utf-8 to latin1 just > by capturing the string... How come? How to avoid it? I've performed > several 'perl unicode regex capture' like searches on google but came > with no relevant hits :-(
What you are seeing is a bug in Perl 5.6.1. The upcoming 5.8.0 has this fixed. > Cheers, > -- > IT'S TIME FOR A DIFFERENT KIND OF WEB > ================================================================ > Jean-Michel Hiver - Software Director > [EMAIL PROTECTED] > +44 (0)114 221 4968 > ================================================================ > VISIT HTTP://WWW.MKDOC.COM -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen