hg wrote: > Duncan Booth wrote: > > hg <[EMAIL PROTECTED]> wrote: > > > >>> or in other words, put this at the top of your file (where "utf-8" is > >>> whatever your editor/system is using): > >>> > >>> # -*- coding: utf-8 -*- > >>> > >>> and use > >>> > >>> u'<text>' > >>> > >>> for all non-ASCII literals. > >>> > >>> </F> > >>> > >> Hi, > >> > >> The problem is that: > >> > >> # -*- coding: utf-8 -*- > >> import string > >> print len('a') > >> print len('à') > >> > >> returns 1 then 2 > > > > And if you do what was suggested and write: > > > > # -*- coding: utf-8 -*- > > import string > > print len(u'a') > > print len(u'à') > > > > then you get: > > > > 1 > > 1
Some general comments: 1. There has been at least one thread on the subject of ripping accents off Latin1 characters in the last 3 or 4 months. Try Google. 2. About your earlier problem, when len(thing1) != len(thing2): In that and similar situations, it can be *very* useful to use this technique: print repr(thing1), type(thing1) print repr(thing2), type(thing2) Go back now and try it out! > OK, > > How would you handle the string.maketrans then ? > I suggest that you first read the documentation on the str and unicode "translate" methods. You can obtain this quickly at the interactive prompt by doing help(''.translate) and help(u''.translate) respectively. Next steps: Is your *real* data (not the examples you were hard-coding earlier) encoded (latin1, utf8) in str objects or is it in unicode objects? After reading previous posts my head is spinning & I'm not going to guess; you determine it yourself. [pseudocode -- blend of Pythonic & Knuthian styles] if latin1: (A) you can use string.maketrans and str.translate immediately. elif unicode: (B) either (1) encode to latin1; goto (A) or (2) use unicode.translate with do-it-yourself mapping elif utf8: decode to unicode; goto (B) else: ??? HTH, John -- http://mail.python.org/mailman/listinfo/python-list