kesh...@us.ibm.com schrieb am 07.01.2012 um 21:18 (-0500):
> The problem is, HTML is not an XML-based language, so unless you've
> deliberately written your input document as XHTML, odds are that no
> XML parser will accept it.

Sure, but as you're saying:

> There are HTML parsers available which produce SAX or DOM (XML)
> output. You could get one of those, use it to read the input document,
> and route its output to Xalan for processing.
> 
> Or you could look for a tool which rewrites HTML as XHTML. I believe
> the W3C's "tidy" tool can be configured to do that. Then you'd run the
> resulting XHTML document (which _is_ XML) through Xalan.

Choices include:

* tidy
* libxml2
* tagsoup

-- 
Michael Ludwig

Reply via email to