Problem reading file with umlauts

Claus Hausberger Tue, 07 Jul 2009 07:01:25 -0700

Hello

I have a text file with is encoding in Latin1 (ISO-8859-1). I can't change that 
as I do not create those files myself.


I have to read those files and convert the umlauts like ö to stuff like &oumol; 
as the text files should become html files.

I have this code:


#!/usr/bin/python
# -*- coding: latin1 -*-

import codecs

f = codecs.open('abc.txt', encoding='latin1')

for line in f:
    print line
    for c in line: 
        if c == "ö":
            print "oe"
        else:
            print c


and I get this error message:

$ ./read.py
Abc

./read.py:11: UnicodeWarning: Unicode equal comparison failed to convert both 
arguments to Unicode - interpreting them as being unequal
  if c == "ö":
A
b
c



Traceback (most recent call last):
  File "./read.py", line 9, in <module>
    print line
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: 
ordinal not in range(128)




I checked the web and tried several approaches but I also get some strange 
encoding errors.
Has anyone ever done this before? 
I am currently using Python 2.5 and may be able to use 2.6 but I cannot yet 
move to 3.1 as many libs we use don't yet work with Python 3.

any help more than welcome.  This has been driving me crazy for two days now.

best wishes

Claus
-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
-- 
http://mail.python.org/mailman/listinfo/python-list

Problem reading file with umlauts

Reply via email to