kesh...@us.ibm.com schrieb am 07.01.2012 um 21:18 (-0500): > The problem is, HTML is not an XML-based language, so unless you've > deliberately written your input document as XHTML, odds are that no > XML parser will accept it.
Sure, but as you're saying: > There are HTML parsers available which produce SAX or DOM (XML) > output. You could get one of those, use it to read the input document, > and route its output to Xalan for processing. > > Or you could look for a tool which rewrites HTML as XHTML. I believe > the W3C's "tidy" tool can be configured to do that. Then you'd run the > resulting XHTML document (which _is_ XML) through Xalan. Choices include: * tidy * libxml2 * tagsoup -- Michael Ludwig