Skip Montanaro wrote: > Peter> Isn't UTF-8 the default? > > Apparently not.
Sorry, I meant the default for XML. > I believe in my reading it said that it used whatever > locale.getpreferredencoding() returned. That's problematic when you > live in a country that thinks ASCII is everything. Personally, I think > UTF-8 should be the default, but that train's long left the station, > at least for Python 2.x. > >> Try opening the file in binary mode then: >> >> with io.open(fname, "rb") as f: >> root = xml.tree.ElementTree.parse(f).getroot() > > Thanks, that worked. Would appreciate an explanation of why binary > mode was necessary. It would seem that since the file contents are > text, just in a non-ASCII encoding, that specifying the encoding when > opening the file should do the trick. > > Skip My tentative explanation would be: If you open the file as text it will be successfully decoded, i. e. io.open(fname, encoding="UTF-8").read() works, but to go back to the bytes that the XML parser needs the "preferred encoding", in your case ASCII, will be used. Since there are non-ascii characters you get a UnicodeEncodeError. -- https://mail.python.org/mailman/listinfo/python-list