Re: Understanding Unicode support in Perl

Autrijus Tang Sat, 25 Jan 2003 00:42:56 -0800

On Fri, Jan 24, 2003 at 03:23:49PM -0500, Tay, William wrote:
> Internally, Perl represent character strings in UTF8. The PerlIO layer for
> input and output enables other encodings to be used for STDIN, STDOUT,
> STDERR and filehandling operations. For instance, if ru_RU.KOI8-R is
> specified (use open ':encoding(ru_RU.KOI8-R)';) as the encoding for data
> coming from STDIN, it will be converted (by PerlIO ?) into UTF8 for internal
> representation, and from UTF8 to ru_RU.KOI8-R for STDOUT.


That's correct.

> Questions:
> 1. Before UTF8 is used as the internal character encoding (before 5.6 ?),
> what default encoding is used to represent data internally?

They are simply stored as byte streams, akin to C.

> 2. What are the measures taken for backward compatibility?

Strings are divided into two classes: Unicode strings and byte strings.
In all circumstances, unless explicitly requested, all data default to the
second class.  You can "promote" strings to Unicode by either
concatenating it with a Unicode string, explicitly ask for it via PerlIO
layers, thru Encode::decode(), or manually utf8::upgrade it.

Since all those methods are not present in older perls, compatibility
is maintained by default. [1]

Hope this helps,
/Autrijus/

[1] There is an exception: in Perl 5.8.0, if your locale indicates that
    you can handle UTF-8, all IO filehandles are marked as ':utf8'.
    This controversial behaviour will probably go away by Perl 5.8.1,
    where it needs to use "perl -C" explicitly to get this behaviour.

msg01652/pgp00000.pgp
Description: PGP signature

Re: Understanding Unicode support in Perl

Reply via email to