Thanks for the help thusfar. To recap - when parsing XML, ElementTree is barfing on extended characters.
1. Yes, most XML is written by monkeys, or the programs written by such monkeys - tough beans, I cannot make my input XML any cleaner without pre-processing - I am not generating it. 2. The documentation suggests that the default encoding of ElementTree is US-ASCII, which is not going to be sufficient. My XML is explicitly setting its encoding to 8859-1, and the XML is actually well-formed(!). 3. I muddied the waters by talking about Python code listing encoding, sorry. EXAMPLES: Vanilla (this works fine): #!/usr/bin/python from elementtree import ElementTree as etree eg = """<seuss><fish>red</fish><fish>blue</fish></seuss>""" xml = etree.fromstring(eg) If I change the example string to this: <seuss><fish>red</fish><fish>blué</fish></seuss> I get the following error: xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 32) Okay, the default encoding for my program (and thus my example string) is US-ASCII, so I'll use 8859-1 instead, adding this line: # coding: iso-8859-1 I get the same error. Just for laughs I'll change the encoding to utf-8. Oops, I get the same error. Has anyone had any luck getting ElementTree to deal with extended characters? If not, has anyone got a suggestion for how to pre-process the text in the XML so it won't barf? Thanks. -- yours, William _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor