John Sampson wrote:
I notice that the string method 'lower' seems to convert some strings
(input from a text file) to Unicode but not others.
This messes up sorting if it is used on arguments of 'sorted' since
Unicode strings come before ordinary ones.
Is there a better way of
John Sampson wrote:
I notice that the string method 'lower' seems to convert some strings
(input from a text file) to Unicode but not others.
I don't think so. You're going to have to show an example.
I *think* what you might be running into is an artifact of printing to a
terminal, which may
I notice that the string method 'lower' seems to convert some strings
(input from a text file) to Unicode but not others.
This messes up sorting if it is used on arguments of 'sorted' since
Unicode strings come before ordinary ones.
Is there a better way of case-insensitive sorting of strings
John Sampson wrote:
I notice that the string method 'lower' seems to convert some strings (input
from a text file) to Unicode but not others.
This messes up sorting if it is used on arguments of 'sorted' since Unicode
strings come before ordinary ones.
I doubt that. Can you provide a short
On Sat, Jan 24, 2015 at 4:53 AM, Peter Otten __pete...@web.de wrote:
Now the same with unicode. To read text with a specific encoding use either
codecs.open() or io.open() instead of the built-in (replace utf-8 with your
actual encoding):
import io
for line in io.open(tmp.txt,
On Sat, Jan 24, 2015 at 6:14 AM, Marko Rauhamaa ma...@pacujo.net wrote:
Well, if Python can't, then who can? Probably nobody in the world, not
generically, anyway.
Example:
print(re\u0301sume\u0301)
résumé
print(r\u00e9sum\u00e9)
résumé
print(re\u0301sume\u0301 ==
Peter Otten __pete...@web.de:
The standard recommendation is to convert bytes to unicode as early as
possible and only manipulate unicode.
Unicode doesn't get you off the hook (as you explain later in your
post). Upper/lowercase as well as collation order is ambiguous. Python
even with decent