On Sep 23, 2005, at 15:29 PM, Eugene Gladchenko wrote:
Maybe the pragma is doingmore than it should


Oh no!

"encoding" pragma does much more than just indicating the encoding of a
script. Here is the quote:

The encoding pragma also modifies the filehandle layers of STDIN and
STDOUT to the specified encoding.

But we do not use STDIN/STDOUT so that should be a non-issue.

By default, if strings operating under byte semantics and strings with
Unicode character data are concatenated, the new string will be created
by decoding the byte strings as ISO 8859-1 (Latin-1).
The encoding pragma changes this to use the specified encoding instead.

For example:

    use encoding 'utf8';
    my $string = chr(20000); # a Unicode string
    utf8::encode($string);   # now it's a UTF-8 encoded byte string
    # concatenate with another Unicode string
    print length($string . chr(20000));

Will print 2, because $string is upgraded as UTF-8. Without "use
encoding 'utf8';", it will print 4 instead, since $string is three
octets when interpreted as Latin-1.

Right, but the effect should be lexical. So other than strings that are upgraded as utf-8 being passed through to Net::LDAP as arguments, it should have no effect. And Net::LDAP should already handle UTF-8 strings ok. But maybe there is one place that has been missed.

What I don't get is that the issue arises as a decode issue and all strings that are fed into the decode come from the socket and should not have this issue.

Graham.

Reply via email to