Tim Golden wrote: > Tim Michelsen wrote: >> Hello, >> I want to process some files encoded in latin-1 (iso-8859-1) in my >> python script that I write on Ubuntu which has UTF-8 as standard encoding. > > Not sure what you mean by "standard encoding" (is this an Ubuntu > thing?)
Probably referring to the encoding the terminal application expects - writing latin-1 chars when the terminal expects utf-8 will not work well. Python also has a default encoding but that is ascii unless you change it yourself. > In this case, assuming you have files in iso-8859-1, something > like this: > > <code> > import codecs > > filenames = ['a.txt', 'b.txt', 'c.txt'] > for filename in filenames: > f = codecs.open (filename, encoding="iso-8859-1") > text = f.read () > # > # If you want to re-encode this -- not sure why -- This is needed to put the text into the proper encoding for the terminal. If you print a unicode string directly it will be encoded using the system default encoding (ascii) which will fail: In [13]: print u'\xe2' ------------------------------------------------------------ Traceback (most recent call last): File "<ipython console>", line 1, in <module> <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\xe2' in position 0: ordinal not in range(128) In [14]: print u'\xe2'.encode('utf-8') รข > # you could do this: > # text = text.encode ("utf-8") > print repr (text) No, not repr, that will print with \ escapes and quotes. In [15]: print repr(u'\xe2'.encode('utf-8')) '\xc3\xa2' And he may not want to change text itself to utf-8. Just print text.encode('utf-8') Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor