On Thu, Feb 28, 2002 at 03:58:33PM +0000, Jean-Michel Hiver wrote:
> Hi again,
> 
> Sorry to send so many messages, but one of my colleagues told me that
> the sample script I've sent wasn't clear enough. So here is my problem
> stripped down as much as I can:
> 
> [jhiver@frogette mkdoc]$ cat test2.pl 
> use strict;
> use utf8;
> 
> my $data = "Copyright \x{A9} 2001-2002 MKDoc Ltd";
> print $data, "\n";
> print $data =~ /(.*)/, "\n";
> 
> 
> [jhiver@frogette mkdoc]$ perl test2.pl 
> Copyright © 2001-2002 MKDoc Ltd
> Copyright © 2001-2002 MKDoc Ltd
> 
> 
> As you can see, the string has been converted from utf-8 to latin1 just
> by capturing the string... How come? How to avoid it? I've performed
> several 'perl unicode regex capture' like searches on google but came
> with no relevant hits :-(

What you are seeing is a bug in Perl 5.6.1.  The upcoming 5.8.0
has this fixed.

> Cheers,
> -- 
> IT'S TIME FOR A DIFFERENT KIND OF WEB
> ================================================================
>   Jean-Michel Hiver - Software Director
>   [EMAIL PROTECTED]
>   +44 (0)114 221 4968
> ================================================================
>                                       VISIT HTTP://WWW.MKDOC.COM

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Reply via email to