On Sep 23, 2005, at 15:29 PM, Eugene Gladchenko wrote:
Maybe the pragma is doingmore than it should
Oh no!
"encoding" pragma does much more than just indicating the encoding
of a
script. Here is the quote:
The encoding pragma also modifies the filehandle layers of STDIN and
STDOUT to the specified encoding.
But we do not use STDIN/STDOUT so that should be a non-issue.
By default, if strings operating under byte semantics and strings with
Unicode character data are concatenated, the new string will be
created
by decoding the byte strings as ISO 8859-1 (Latin-1).
The encoding pragma changes this to use the specified encoding
instead.
For example:
use encoding 'utf8';
my $string = chr(20000); # a Unicode string
utf8::encode($string); # now it's a UTF-8 encoded byte string
# concatenate with another Unicode string
print length($string . chr(20000));
Will print 2, because $string is upgraded as UTF-8. Without "use
encoding 'utf8';", it will print 4 instead, since $string is three
octets when interpreted as Latin-1.
Right, but the effect should be lexical. So other than strings that
are upgraded as utf-8 being passed through to Net::LDAP as arguments,
it should have no effect. And Net::LDAP should already handle UTF-8
strings ok. But maybe there is one place that has been missed.
What I don't get is that the issue arises as a decode issue and all
strings that are fed into the decode come from the socket and should
not have this issue.
Graham.