Re: Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread Peter Otten
John Sampson wrote: I notice that the string method 'lower' seems to convert some strings (input from a text file) to Unicode but not others. This messes up sorting if it is used on arguments of 'sorted' since Unicode strings come before ordinary ones. Is there a better way of

Re: Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread Steven D'Aprano
John Sampson wrote: I notice that the string method 'lower' seems to convert some strings (input from a text file) to Unicode but not others. I don't think so. You're going to have to show an example. I *think* what you might be running into is an artifact of printing to a terminal, which may

Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread John Sampson
I notice that the string method 'lower' seems to convert some strings (input from a text file) to Unicode but not others. This messes up sorting if it is used on arguments of 'sorted' since Unicode strings come before ordinary ones. Is there a better way of case-insensitive sorting of strings

Re: Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread Michael Ströder
John Sampson wrote: I notice that the string method 'lower' seems to convert some strings (input from a text file) to Unicode but not others. This messes up sorting if it is used on arguments of 'sorted' since Unicode strings come before ordinary ones. I doubt that. Can you provide a short

Re: Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread Chris Angelico
On Sat, Jan 24, 2015 at 4:53 AM, Peter Otten __pete...@web.de wrote: Now the same with unicode. To read text with a specific encoding use either codecs.open() or io.open() instead of the built-in (replace utf-8 with your actual encoding): import io for line in io.open(tmp.txt,

Re: Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread Chris Angelico
On Sat, Jan 24, 2015 at 6:14 AM, Marko Rauhamaa ma...@pacujo.net wrote: Well, if Python can't, then who can? Probably nobody in the world, not generically, anyway. Example: print(re\u0301sume\u0301) résumé print(r\u00e9sum\u00e9) résumé print(re\u0301sume\u0301 ==

Re: Case-insensitive sorting of strings (Python newbie)

2015-01-23 Thread Marko Rauhamaa
Peter Otten __pete...@web.de: The standard recommendation is to convert bytes to unicode as early as possible and only manipulate unicode. Unicode doesn't get you off the hook (as you explain later in your post). Upper/lowercase as well as collation order is ambiguous. Python even with decent