Thanks for all of the input on this. I was aware of the problems parsing HTML with vanilla Xalan-j but just discovered that by some stroke of luck the source docs are actually XHTML anyway [1].
Again thank you for the help. Lewis [1] https://github.com/lewismc/yax/blob/master/sts/0.1.html ________________________________________ From: Timothy Jones [timothy.jo...@syniverse.com] Sent: 08 January 2012 15:24 To: mil...@gmx.de; xalan-j-users@xml.apache.org Subject: Re: Basic XSL HTML-> XML query I recall using a small Java library called Neko to parse HTML into an XML DOM. It did the trick! Just wanted to add it to the list. tlj ----- Original Message ----- From: Michael Ludwig [mailto:mil...@gmx.de] Sent: Sunday, January 08, 2012 04:57 AM To: xalan-j-users@xml.apache.org <xalan-j-users@xml.apache.org> Subject: Re: Basic XSL HTML-> XML query kesh...@us.ibm.com schrieb am 07.01.2012 um 21:18 (-0500): > The problem is, HTML is not an XML-based language, so unless you've > deliberately written your input document as XHTML, odds are that no > XML parser will accept it. Sure, but as you're saying: > There are HTML parsers available which produce SAX or DOM (XML) > output. You could get one of those, use it to read the input document, > and route its output to Xalan for processing. > > Or you could look for a tool which rewrites HTML as XHTML. I believe > the W3C's "tidy" tool can be configured to do that. Then you'd run the > resulting XHTML document (which _is_ XML) through Xalan. Choices include: * tidy * libxml2 * tagsoup -- Michael Ludwig Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html