You can try jTidy it has xHtml formatter Markus Wiederkehr wrote:
Sorry if this question is a bit off topic, but I thought people on this mailing list might know... I'm working on a Tapestry application that has to display arbitrary HTML files inline (like GMail does when you receive an HTML e-mail, for example). So basically I want to develop a Tapestry component that can render an HTML file. The obvious requirement is that I don't want my page to be corrupted by the HTML file. So I would have to remove /bad/ elements like SCRIPT, /bad/ attributes like ID or NAME, etc. But in addition the result also has to be compliant to XHTML 1.0 Transitional, no matter how sloppy the original HTML file is. I tried to run the HTML through NekoHTML to create a DOM. Then I tried to remove bad elements and attributes from that DOM, but still the result might not be XHTML compliant due to missing or misplaced elements. So that approach looks like a lot of work... Has anyone done something like this or has anyone a better idea how to accomplish this? Thanks, Markus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
