I recall using a small Java library called Neko to parse HTML into an XML DOM. It did the trick! Just wanted to add it to the list.
tlj ----- Original Message ----- From: Michael Ludwig [mailto:mil...@gmx.de] Sent: Sunday, January 08, 2012 04:57 AM To: xalan-j-users@xml.apache.org <xalan-j-users@xml.apache.org> Subject: Re: Basic XSL HTML-> XML query kesh...@us.ibm.com schrieb am 07.01.2012 um 21:18 (-0500): > The problem is, HTML is not an XML-based language, so unless you've > deliberately written your input document as XHTML, odds are that no > XML parser will accept it. Sure, but as you're saying: > There are HTML parsers available which produce SAX or DOM (XML) > output. You could get one of those, use it to read the input document, > and route its output to Xalan for processing. > > Or you could look for a tool which rewrites HTML as XHTML. I believe > the W3C's "tidy" tool can be configured to do that. Then you'd run the > resulting XHTML document (which _is_ XML) through Xalan. Choices include: * tidy * libxml2 * tagsoup -- Michael Ludwig