Re: [Jprogramming] Parsing HTML in J

Michael Dykman Fri, 05 Jan 2007 12:51:37 -0800


Good idea!  If I do that, I can extract pieces with XPath and forgo J
parsing altogether.  In fact, I have a shell script which does exactly
that, somewhere....


As you are willing to forgoe J for htis part then I will mention that
I have always had excellent luck with Tidy which was originated with
the W3C.  It is now independantly maintained.  The output will be
compliant HTML.  I have used it to convert random HTML into XHTML and
then used an XML parser to turn it into a DOM...  it fixes a
surprisingly wide range of markup errors.

http://tidy.sourceforge.net/

- michael

--
- michael dykman
- [EMAIL PROTECTED]
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Parsing HTML in J

Reply via email to