On 7/18/06, James Y Knight <[EMAIL PROTECTED]> wrote:
> On Jul 18, 2006, at 1:54 PM, Martin v. Löwis wrote:
>
> > Mihai Ibanescu wrote:
> >> To follow up on my own email: it looks like, even though in some
> >> locale
> >> "INFO".lower() != "info"
> >>
> >> u"INFO".lower() == "info" (at least in the Turkish locale).
> >>
> >> Is that guaranteed, at least for now (for the current versions of
> >> python)?
> >
> > It's guaranteed for now; unicode.lower is not locale-aware.
>
> That seems backwards of how it should be ideally: the byte-string
> upper and lower should always do ascii uppering-and-lowering, and the
> unicode ones should do it according to locale. Perhaps that can be
> cleaned up in py3k?

No, you've got it backwards. 8-bit strings are assumed to be encoded
using the current locale's default encoding so upper and lower behave
locale-dependent. Unicode strings don't need the locale as additional
input for upper and lower; in a different locale you simply use a
different code point. I believe the original issue was that in
Turkish, there are i's with and without dots in both lower and upper
case. I'm guessing that the ASCII code points are used for lowercase
dotted i and uppercase undotted I; code points with the high bit set
are used for the uppercase dotted i and lowercase undotted I (which I
can't easily type here).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to