Re: [Tutor] input file encoding

Tim Golden Tue, 11 Sep 2007 01:06:47 -0700

Tim Michelsen wrote:
> Hello,
> I want to process some files encoded in latin-1 (iso-8859-1) in my 
> python script that I write on Ubuntu which has UTF-8 as standard encoding.


Not sure what you mean by "standard encoding" (is this an Ubuntu
thing?) but essentially whenever you're pulling stuff into Python
which is encoded and which you want to treat as Unicode, you need
to decode it explicitly, either on a string-by-string basis or by
using the codecs module to treat the whole of a file as encoded.

In this case, assuming you have files in iso-8859-1, something
like this:

<code>
import codecs

filenames = ['a.txt', 'b.txt', 'c.txt']
for filename in filenames:
   f = codecs.open (filename, encoding="iso-8859-1")
   text = f.read ()
   #
   # If you want to re-encode this -- not sure why --
   # you could do this:
   # text = text.encode ("utf-8")
   print repr (text)

</code>

TJG
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] input file encoding

Reply via email to