I am using lxml to process some xhtml files. The files have html character codes embedded in them. For instance: ' rather than a '. When I parse the files, edit them, and then write them back out, I want my edits to be the only changes in the output files, but lxml is replacing the character codes with the actual characters they are supposed to represent as well.
So if I have: It& #39;s an example. <-- Space inserted to help readability. It is writing out: It's an example. I've tried setting resolve_entities to false, ala: tree = etree.parse(input, etree.XMLParser(resolve_entities=False)) But this seems to have no effect. There a way to tell lxml to ignore these/leave them as is? Thanks. -s -- View this message in context: http://www.nabble.com/lxml---html-entities-tp18693223p18693223.html Sent from the Python - xml-sig mailing list archive at Nabble.com. _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig