Hello all, There a document out there on the 'net that appears to be an XHTML document:
<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml"> Great, right? But unfortunately it's malformed in a number of ways (mismatched tags, tag case problems, unescaped '&' in URLs, etc). Neither minidom.parseStream() nor xml.dom.ext.reader.Sax2.Reader.fromStream() will parse it correctly: xml.sax._exceptions.SAXParseException: foo.html:2:0: syntax error And even if one gets rid of the bogus doctype declaration, the rest of the document just makes the parsers fall over: xml.sax._exceptions.SAXParseException: foo.html:14:53: not well-formed (invalid token) My next thought was to parse this with xml.dom.ext.reader.HtmlLib...but HtmlLib doesn't like the namespace declarations: xml.dom.NamespaceErr: Invalid or illegal namespace operation I need to parse this document into a DOM, make some changes, and then spit back out the modified file as (X?)HTML (ideally well-formed). Am I going to be able to do this with PyXML? If not, I'd love to hear your suggestions for the appropriate tools. Thanks! -- Lars -- Lars Kellogg-Stedman <[EMAIL PROTECTED]> _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig