On Fri, Jan 24, 2003 at 03:23:49PM -0500, Tay, William wrote: > Internally, Perl represent character strings in UTF8. The PerlIO layer for > input and output enables other encodings to be used for STDIN, STDOUT, > STDERR and filehandling operations. For instance, if ru_RU.KOI8-R is > specified (use open ':encoding(ru_RU.KOI8-R)';) as the encoding for data > coming from STDIN, it will be converted (by PerlIO ?) into UTF8 for internal > representation, and from UTF8 to ru_RU.KOI8-R for STDOUT.
That's correct. > Questions: > 1. Before UTF8 is used as the internal character encoding (before 5.6 ?), > what default encoding is used to represent data internally? They are simply stored as byte streams, akin to C. > 2. What are the measures taken for backward compatibility? Strings are divided into two classes: Unicode strings and byte strings. In all circumstances, unless explicitly requested, all data default to the second class. You can "promote" strings to Unicode by either concatenating it with a Unicode string, explicitly ask for it via PerlIO layers, thru Encode::decode(), or manually utf8::upgrade it. Since all those methods are not present in older perls, compatibility is maintained by default. [1] Hope this helps, /Autrijus/ [1] There is an exception: in Perl 5.8.0, if your locale indicates that you can handle UTF-8, all IO filehandles are marked as ':utf8'. This controversial behaviour will probably go away by Perl 5.8.1, where it needs to use "perl -C" explicitly to get this behaviour.
msg01652/pgp00000.pgp
Description: PGP signature