Thanks for all of the input on this. I was aware of the problems parsing HTML 
with vanilla Xalan-j but just discovered that by some stroke of luck the source 
docs are actually XHTML anyway [1].

Again thank you for the help.

Lewis

[1] https://github.com/lewismc/yax/blob/master/sts/0.1.html

________________________________________
From: Timothy Jones [timothy.jo...@syniverse.com]
Sent: 08 January 2012 15:24
To: mil...@gmx.de; xalan-j-users@xml.apache.org
Subject: Re: Basic XSL HTML-> XML query

I recall using a small Java library called Neko to parse HTML into an XML DOM.  
It did the trick!   Just wanted to add it to the list.


tlj

----- Original Message -----
From: Michael Ludwig [mailto:mil...@gmx.de]
Sent: Sunday, January 08, 2012 04:57 AM
To: xalan-j-users@xml.apache.org <xalan-j-users@xml.apache.org>
Subject: Re: Basic XSL HTML-> XML query

kesh...@us.ibm.com schrieb am 07.01.2012 um 21:18 (-0500):
> The problem is, HTML is not an XML-based language, so unless you've
> deliberately written your input document as XHTML, odds are that no
> XML parser will accept it.

Sure, but as you're saying:

> There are HTML parsers available which produce SAX or DOM (XML)
> output. You could get one of those, use it to read the input document,
> and route its output to Xalan for processing.
>
> Or you could look for a tool which rewrites HTML as XHTML. I believe
> the W3C's "tidy" tool can be configured to do that. Then you'd run the
> resulting XHTML document (which _is_ XML) through Xalan.

Choices include:

* tidy
* libxml2
* tagsoup

--
Michael Ludwig
Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Reply via email to