Hi, * I seem to be having a problem when using the PrettyPrint class (from PyXML's xml.dom.ext) to generate an XML document using latin-1 characters (specifically, french text accents)?
PYTHON CODE USED: ================ # -*- coding: iso-8859-1 -*- # XML Generation example from xml.dom import implementation from xml.dom.ext import PrettyPrint import StringIO # Create an XML document: doc = implementation.createDocument(None, 'cases', None) root = doc.documentElement celem = doc.createElement('case') pelem = doc.createElement('problem') felem = doc.createElement('feature') felem.appendChild(doc.createTextNode( "élève" )) # Method1 (see ERROR 1) #felem.appendChild(doc.createTextNode( unicode("élève", 'latin-1') )) # Method2 (see ERROR 2) felem.setAttribute("fid", "1") pelem.appendChild(felem) celem.appendChild(pelem) root.appendChild(celem) root.setAttribute("date", "jan 01 2005") # Print generated XML document: xml_str = StringIO.StringIO() PrettyPrint(doc, xml_str) # failure point ? print xml_str.getvalue() ERROR 1 - OBTAINED FROM ONLY USING (-*- coding: iso-8859-1 -*- as a script header): ============================================================================ ====== ---------- Capture Output ---------- > "C:\Program Files\Python24\python.exe" genxml.py Traceback (most recent call last): File "genxml.py", line 26, in ? PrettyPrint(doc, xml_str) # failure point ? File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\__init__.py", line 81, in PrettyPrint Printer.PrintWalker(visitor, root).run() File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 385, in run return self.step() File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 381, in step self.visitor.visit(self.start_node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 185, in visit return self.visitDocument(node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 231, in visitDocument self.visitNodeList(node.childNodes, exclude=node.doctype) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in visitNodeList curr is not exclude and self.visit(curr) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in visit return self.visitElement(node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in visitElement self.visitNodeList(node.childNodes) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in visitNodeList curr is not exclude and self.visit(curr) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in visit return self.visitElement(node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in visitElement self.visitNodeList(node.childNodes) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in visitNodeList curr is not exclude and self.visit(curr) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in visit return self.visitElement(node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in visitElement self.visitNodeList(node.childNodes) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in visitNodeList curr is not exclude and self.visit(curr) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 161, in visit return self.visitElement(node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 270, in visitElement self.visitNodeList(node.childNodes) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 201, in visitNodeList curr is not exclude and self.visit(curr) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 167, in visit return self.visitText(node) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 293, in visitText text = TranslateCdata(text, self.encoding) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 118, in TranslateCdata new_string = charsetHandler(new_string, encoding) File "C:\Program Files\Python24\lib\site-packages\_xmlplus\dom\ext\Printer.py", line 44, in utf8_to_code text = unicode(text, "utf-8") UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data > Terminated with exit code 1. ERROR 2 - WHEN EXPLICITLY ENCODING MY STRING AS UNICODE: ======================================================== ---------- Capture Output ---------- > "C:\Program Files\Python24\python.exe" genxml.py <?xml version='1.0' encoding='UTF-8'?> <cases date='jan 01 2005'> <case> <problem> <feature fid='1'>éleve</feature> </problem> </case> </cases> > Terminated with exit code 0. COMMENT: As can be seen, when using Method1 (default encoding with iso8859-1, I get a UnicodeDecodeError. And, when using Metho2, explicitely encoding using using unicode("élève", 'latin-1'), the PrettyPrint class does not raise an exception, but it garbles (does not correctly interpret) my latin-1 string (i.e. élève). EXTRA DETAILS: ============== * Running on Windows XP (sp2) * Python 2.4.2 * PyXML 0.8.4 * 4Suite 1.0b1 * I have tried many other encoding formats such as utf8, utf-16, utf16-le, etc. with no luck ! Any comments or suggestions would be most appreciated. Regards, Michel _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig