On Fri, Jan 24, 2003 at 03:23:49PM -0500, Tay, William wrote: > Internally, Perl represent character strings in UTF8. The PerlIO layer for > input and output enables other encodings to be used for STDIN, STDOUT, > STDERR and filehandling operations. For instance, if ru_RU.KOI8-R is > specified (use open ':encoding(ru_RU.KOI8-R)';) as the encoding for data > coming from STDIN, it will be converted (by PerlIO ?) into UTF8 for internal > representation, and from UTF8 to ru_RU.KOI8-R for STDOUT.
That's correct.
> Questions:
> 1. Before UTF8 is used as the internal character encoding (before 5.6 ?),
> what default encoding is used to represent data internally?
They are simply stored as byte streams, akin to C.
> 2. What are the measures taken for backward compatibility?
Strings are divided into two classes: Unicode strings and byte strings.
In all circumstances, unless explicitly requested, all data default to the
second class. You can "promote" strings to Unicode by either
concatenating it with a Unicode string, explicitly ask for it via PerlIO
layers, thru Encode::decode(), or manually utf8::upgrade it.
Since all those methods are not present in older perls, compatibility
is maintained by default. [1]
Hope this helps,
/Autrijus/
[1] There is an exception: in Perl 5.8.0, if your locale indicates that
you can handle UTF-8, all IO filehandles are marked as ':utf8'.
This controversial behaviour will probably go away by Perl 5.8.1,
where it needs to use "perl -C" explicitly to get this behaviour.
msg01652/pgp00000.pgp
Description: PGP signature
