Re: [XML-SIG] lxml - html entities

Stefan Behnel Mon, 28 Jul 2008 22:44:55 -0700

(this is being discussed on the lxml mailing list)


spencer.c wrote:
> I am using lxml to process some xhtml files.  The files have html character
> codes embedded in them.  For instance: &#39; rather than a '.  When I parse
> the files, edit them, and then write them back out, I want my edits to be
> the only changes in the output files, but lxml is replacing the character
> codes with the actual characters they are supposed to represent as well.
> 
> So if I have:
> It& #39;s an example. <-- Space inserted to help readability.
> 
> It is writing out:
> It's an example.  
> 
> I've tried setting resolve_entities to false, ala:
> tree = etree.parse(input, etree.XMLParser(resolve_entities=False))
> 
> But this seems to have no effect.
> 
> There a way to tell lxml to ignore these/leave them as is?
> 
> Thanks.
> 
> -s
_______________________________________________
XML-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/xml-sig

Re: [XML-SIG] lxml - html entities

Reply via email to