Re: Interpretation of non-UTF8 strings

Jarkko Hietaniemi Mon, 16 Aug 2004 06:18:31 -0700

Some more thoughts that came to my mind while we were walking the dog...

Those whose think switching UTF-8 based on locale settings should spend
some time with the Redhat bug database.  RH 8 and 9 used an early
prerelease version of Perl 5.8.0, which did switch on fully UTF-8-ness
based on locale settings.  This turned out to be quite a mess because
RH8/9 had *by default* such locales - *every* RH8/9 user was subjected
to full UTF-8, e.g. UTF-8 I/O.


As far for the Latin-1 (or EBCDIC, if on EBCDIC) being silently the
default, yes, that was a mistake, a culturally insensitive mistake at that.

But there is a simple workaround for that, as perluniintro would tell
you: the encoding pragma.

A further helper is the encoding::warnings pragma from Autrijus Tang,
you can find it from CPAN.

Summary: I think Unicode is complex enough, and different users
and developers have different levels of expectations and needs,
and different levels of understanding what Unicode means, and assuming
one wants to have any level of backward compatibility, that there simply
is no one single set of "right things" to choose when implementing
Unicode support.  The best one can hope is to build enough knobs the
users can pull and twist, to adjust to the level of Unicodeness they
want and need and understand.

Re: Interpretation of non-UTF8 strings

Reply via email to