On Thu, Jul 31, 2008 at 9:44 AM, william tanksley <[EMAIL PROTECTED]> wrote: > I'm using a file, a file that's correctly encoded as UTF-8, and it > returns some text elements that are raw bytes (undecoded). I have to > manually decode them.
I can't reproduce this behavior. Here's a simple test case: C:\Program Files\Python25>python -V Python 2.5.2 C:\Program Files\Python25>more t.py import xml.etree.cElementTree as ET xml_string = """<?xml version="1.0" encoding="UTF-8"?> <character title="GREEK SMALL LETTER PI">\xcf\x80</character>""" outfile = open('sample.xml', 'wb') outfile.write(xml_string) outfile.close() tree = ET.parse('sample.xml') root = tree.getroot() print type(root.text) print repr(root.text) print root.text C:\Program Files\Python25>python t.py <type 'unicode'> u'\u03c0' π That seems to work as expected. I wrote out a UTF-8 encoded bytestring with a proper xml encoding statement. When I parsed the file with cElementTree, it returned unicode data. Does this same program work for you? If so, maybe you need to show us more of your code to see where things are going wrong. -- Jerry -- http://mail.python.org/mailman/listinfo/python-list