(this is being discussed on the lxml mailing list)
spencer.c wrote: > I am using lxml to process some xhtml files. The files have html character > codes embedded in them. For instance: ' rather than a '. When I parse > the files, edit them, and then write them back out, I want my edits to be > the only changes in the output files, but lxml is replacing the character > codes with the actual characters they are supposed to represent as well. > > So if I have: > It& #39;s an example. <-- Space inserted to help readability. > > It is writing out: > It's an example. > > I've tried setting resolve_entities to false, ala: > tree = etree.parse(input, etree.XMLParser(resolve_entities=False)) > > But this seems to have no effect. > > There a way to tell lxml to ignore these/leave them as is? > > Thanks. > > -s _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig