I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small example shows the problem:
import codecs fo = open('test.dat', 'w') fo.write('G\xe2teaux') fo.close() fi = open("test.dat",'r') fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1') astring = fx.readline() print astring ustring = unicode(astring, 'utf-8' ) print repr(ustring) print ustring.encode('latin-1') print ustring.encode('utf-8') Python 2.4 gives: Gâteaux u'G\xe2teaux' Gâteaux Gâteaux which I believe is correct, while 2.5 produces Traceback (most recent call last): File "test_codec.py", line 8, in <module> astring = fx.readline() File "C:\Python25\lib\codecs.py", line 709, in readline data = self.reader.readline() File "C:\Python25\lib\codecs.py", line 471, in readline data = self.read(readsize, firstline=True) File "C:\Python25\lib\codecs.py", line 418, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data Is there a genuine problem here, or have I been misusing this function? -- Regards David Hughes -- http://mail.python.org/mailman/listinfo/python-list