Re: utf - string translation

John Machin Wed, 29 Nov 2006 11:56:02 -0800

Frederic Rentsch wrote:

> Try this:
>
> from_characters   =
> '\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xff\xe7\xe8\xe9\xea\xeb'
> to_characters     =
> 'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaiiiionoooooouuuuyyceeee'
> translation_table = string.maketrans (from_characters, to_characters)
> translated_string = string.translate (original_string, translation_table)
>


A few observations on the above:

1. This assumes that "original_string" is a str object, and the text is
encoded in latin1 or similar (e.g. cp1252).

2. Presentation of the map could be improved greatly, along the lines
of:

import pprint
import unicodedata
fromc = \
[snip]
toc = 'AAAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaaaiiiionoooooouuuuyyceeee'
assert len(fromc) == len(toc)
tups = list(zip(unicode(fromc, 'latin1'), toc))
tups.sort()
tupsu = [(x[1], x[0], unicodedata.name(x[0], '** no name **')) for x in
tups]
pprint.pprint(tupsu)

which produces:

[('A', u'\xc0', 'LATIN CAPITAL LETTER A WITH GRAVE'),
 ('A', u'\xc1', 'LATIN CAPITAL LETTER A WITH ACUTE'),
[snip]
 ('D', u'\xd0', 'LATIN CAPITAL LETTER ETH'),
[snip]
 ('Y', u'\xdd', 'LATIN CAPITAL LETTER Y WITH ACUTE'),
 ('a', u'\xe0', 'LATIN SMALL LETTER A WITH GRAVE'),
[snip]
 ('o', u'\xf0', 'LATIN SMALL LETTER ETH'),
[snip]
 ('y', u'\xfd', 'LATIN SMALL LETTER Y WITH ACUTE'),
 ('y', u'\xff', 'LATIN SMALL LETTER Y WITH DIAERESIS')]

This makes it a lot easier to see what is going on, and check for
weirdness, like the inconsistent treatment of \xd0 and \xf0.

3. ... and to check for missing maps. The OP may be working only with
French text, and may not care about Icelandic and German letters, but
other readers who stumble on this (and miss past thread(s) on this
topic) may like something done with \xde (capital thorn),  \xfe (small
thorn) and \xdf (sharp s aka Eszett).

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: utf - string translation

Reply via email to