Martin v. Löwis wrote: > Well, if the document is UTF-8, you should decode it as UTF-8, of > course.
Thanks. This and: http://en.wikipedia.org/wiki/UTF-8 solved my problem with understanding the encoding. Anton proof that I understand it now (please anyone, prove me wrong if you can): from zipfile import ZipFile, ZIP_DEFLATED def by80(seq): it = iter(seq) while it: yield ''.join(it.next() for i in range(80)) def utfCheck(infn): zin = ZipFile(infn, 'r', ZIP_DEFLATED) data = zin.read('content.xml').decode('utf-8') for line in by80(data): print line.encode('1252') def test(): infn = "xxx.sxw" utfCheck(infn) if __name__=='__main__': test() -- http://mail.python.org/mailman/listinfo/python-list