On Wed, Apr 7, 2010 at 17:42, Aristotle Pagaltzis <pagalt...@gmx.de> wrote:
> * Michael Ludwig <michael.lud...@xing.com> [2010-04-07 15:00]:
>> Having read Juerd's list of useful advice, I don't understand
>> the reason for its last three items:
>>
>> • utf8::upgrade before doing lc/lcfirst/uc
>> • utf8::upgrade before doing case insensitive matching
>> • utf8::upgrade before matching predefined character classes
>>   like w and s
>>
>> Can anyone enlighten me on the background of using
>> utf8::upgrade here?
>
> Perl versions up to the upcoming 5.12.0 (I think) are buggy in
> that they apply ISO-8859-1 semantics to downgraded strings and
> Unicode semantics to upgraded strings

This fix was withdrawn from 5.12.0.  Currently you have to "use
feature 'unicode_strings'" to get the sane behaviour in the current
lexical scope.  Current 'perldoc unicode' also says:

       The "use feature 'unicode_strings'" pragma is intended to
       always, regardless of platform, force Unicode semantics in
       a particular lexical scope.  In release 5.12, it is
       partially implemented, applying only to case changes.  See
       "The "Unicode Bug"" below.

This means that the utf8::upgrade() advice also applies to perl-5.12.0.

Regards,
Gisle


>                                         , even when they contain the
> same data. By upgrading your strings, you make sure that you get
> Unicode semantics consistently.
>
> Regards,
> --
> Aristotle Pagaltzis // <http://plasmasturm.org/>
>

Reply via email to