On Jan 12, 8:25 pm, Robert Kern <[EMAIL PROTECTED]> wrote: > The section on "String Methods"[1] in the Python documentation states that for > the case conversion methods like str.lower(), "For 8-bit strings, this method > is > locale-dependent." Is there a guarantee that unicode.lower() is > locale-*in*dependent? > > The section on "Case Conversion" in PEP 100 suggests this, but the code itself > looks like to may call the C function towlower() if it is available. On OS X > Leopard, the manpage for towlower(3) states that it "uses the current locale" > though it doesn't say exactly *how* it uses it. > > This is the bug I'm trying to fix: > > http://scipy.org/scipy/numpy/ticket/643 > http://dev.laptop.org/ticket/5559 > > [1]http://docs.python.org/lib/string-methods.html > [2]http://www.python.org/dev/peps/pep-0100/ >
The Unicode standard says that case mappings are language-dependent. It gives the example of the Turkish dotted capital letter I and dotless small letter i that "caused" the numpy problem. See http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G21180 Here is what the Python 2.5.1 unicode implementation does in an English-language locale: >>> import unicodedata as ucd >>> eyes = u"Ii\u0130\u0131" >>> for eye in eyes: ... print repr(eye), ucd.name(eye) ... u'I' LATIN CAPITAL LETTER I u'i' LATIN SMALL LETTER I u'\u0130' LATIN CAPITAL LETTER I WITH DOT ABOVE u'\u0131' LATIN SMALL LETTER DOTLESS I >>> for eye in eyes: ... print "%r %r %r %r" % (eye, eye.upper(), eye.lower(), eye.capitalize()) ... u'I' u'I' u'i' u'I' u'i' u'I' u'i' u'I' u'\u0130' u'\u0130' u'i' u'\u0130' u'\u0131' u'I' u'\u0131' u'I' The conversions for I and i are not correct for a Turkish locale. I don't know how to repeat the above in a Turkish locale. However it appears from your bug ticket that you have a much narrower problem (case-shifting a small known list of English words like VOID) and can work around it by writing your own locale-independent casing functions. Do you still need to find out whether Python unicode casings are locale-dependent? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list