Frederic Rentsch wrote: > Try this: > > from_characters = > '\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xff\xe7\xe8\xe9\xea\xeb' > to_characters = > 'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaiiiionoooooouuuuyyceeee' > translation_table = string.maketrans (from_characters, to_characters) > translated_string = string.translate (original_string, translation_table) >
A few observations on the above: 1. This assumes that "original_string" is a str object, and the text is encoded in latin1 or similar (e.g. cp1252). 2. Presentation of the map could be improved greatly, along the lines of: import pprint import unicodedata fromc = \ [snip] toc = 'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaiiiionoooooouuuuyyceeee' assert len(fromc) == len(toc) tups = list(zip(unicode(fromc, 'latin1'), toc)) tups.sort() tupsu = [(x[1], x[0], unicodedata.name(x[0], '** no name **')) for x in tups] pprint.pprint(tupsu) which produces: [('A', u'\xc0', 'LATIN CAPITAL LETTER A WITH GRAVE'), ('A', u'\xc1', 'LATIN CAPITAL LETTER A WITH ACUTE'), [snip] ('D', u'\xd0', 'LATIN CAPITAL LETTER ETH'), [snip] ('Y', u'\xdd', 'LATIN CAPITAL LETTER Y WITH ACUTE'), ('a', u'\xe0', 'LATIN SMALL LETTER A WITH GRAVE'), [snip] ('o', u'\xf0', 'LATIN SMALL LETTER ETH'), [snip] ('y', u'\xfd', 'LATIN SMALL LETTER Y WITH ACUTE'), ('y', u'\xff', 'LATIN SMALL LETTER Y WITH DIAERESIS')] This makes it a lot easier to see what is going on, and check for weirdness, like the inconsistent treatment of \xd0 and \xf0. 3. ... and to check for missing maps. The OP may be working only with French text, and may not care about Icelandic and German letters, but other readers who stumble on this (and miss past thread(s) on this topic) may like something done with \xde (capital thorn), \xfe (small thorn) and \xdf (sharp s aka Eszett). Cheers, John -- http://mail.python.org/mailman/listinfo/python-list