Graham Barr wrote:

> I find this very odd. The encoding pragma is not supposed to change how
> perl treats anything other than the sctual script in the file that the
> pragma is in. It allows you to write in any encoding and perl will
> translate that to utf8 when it parses the script.
>
> Maybe the pragma is doingmore than it should

Oh no!

"encoding" pragma does much more than just indicating the encoding of a
script. Here is the quote:

The encoding pragma also modifies the filehandle layers of STDIN and
STDOUT to the specified encoding.

By default, if strings operating under byte semantics and strings with
Unicode character data are concatenated, the new string will be created
by decoding the byte strings as ISO 8859-1 (Latin-1).
The encoding pragma changes this to use the specified encoding instead.

For example:

    use encoding 'utf8';
    my $string = chr(20000); # a Unicode string
    utf8::encode($string);   # now it's a UTF-8 encoded byte string
    # concatenate with another Unicode string
    print length($string . chr(20000));

Will print 2, because $string is upgraded as UTF-8. Without "use
encoding 'utf8';", it will print 4 instead, since $string is three
octets when interpreted as Latin-1.

--
Eugene Gladchenko,
EVG15-RIPE

Reply via email to