On Wed, Apr 7, 2010 at 17:42, Aristotle Pagaltzis <pagalt...@gmx.de> wrote: > * Michael Ludwig <michael.lud...@xing.com> [2010-04-07 15:00]: >> Having read Juerd's list of useful advice, I don't understand >> the reason for its last three items: >> >> • utf8::upgrade before doing lc/lcfirst/uc >> • utf8::upgrade before doing case insensitive matching >> • utf8::upgrade before matching predefined character classes >> like w and s >> >> Can anyone enlighten me on the background of using >> utf8::upgrade here? > > Perl versions up to the upcoming 5.12.0 (I think) are buggy in > that they apply ISO-8859-1 semantics to downgraded strings and > Unicode semantics to upgraded strings
This fix was withdrawn from 5.12.0. Currently you have to "use feature 'unicode_strings'" to get the sane behaviour in the current lexical scope. Current 'perldoc unicode' also says: The "use feature 'unicode_strings'" pragma is intended to always, regardless of platform, force Unicode semantics in a particular lexical scope. In release 5.12, it is partially implemented, applying only to case changes. See "The "Unicode Bug"" below. This means that the utf8::upgrade() advice also applies to perl-5.12.0. Regards, Gisle > , even when they contain the > same data. By upgrading your strings, you make sure that you get > Unicode semantics consistently. > > Regards, > -- > Aristotle Pagaltzis // <http://plasmasturm.org/> >