Peter Otten wrote: >> You can replace ALL of this upshifting and accent removal in one blow >> by using the string translate() method with a suitable table. > > Only if you convert to unicode first or if your data maintains 1 byte > == 1 character, in particular it is not UTF-8. >
There's a nice little codec from Skip Montaro for removing accents from latin-1 encoded strings. It also has an error handler so you can convert from unicode to ascii and strip all the accents as you do so: http://orca.mojam.com/~skip/python/latscii.py >>> import latscii >>> import htmlentitydefs >>> print u'\u00c9'.encode('ascii','replacelatscii') E >>> So Bussiere could replace a large chunk of his code with: ligneA = ligneA.decode(INPUTENCODING).encode('ascii', 'replacelatscii') ligneA = ligneA.upper() INPUTENCODING is 'utf8' unless (one possible explanation for his problem) his files are actually in some different encoding. Unfortunately, just as I finished writing this I discovered that the latscii module isn't as robust as I thought, it blows up on consecutive accented characters. :( -- http://mail.python.org/mailman/listinfo/python-list