RE: [Jprogramming] Parsing HTML in J

Oleg Kobchenko Fri, 05 Jan 2007 12:06:37 -0800

--- Stefano Lanzavecchia <[EMAIL PROTECTED]> wrote:

> > I tried to use Oleg's XML parser, but no go:  I'm not dealing with
> > XHTML;
> 
> Have you considered the idea of "fixing" the HTML before you process it?
> There's a little piece of software called "HTML Tidy" which comes in various
> flavours (executable, DLL, .NET DLL, Perl, Python, Java) that takes real
> life HTML and turns it into XHTML which you should then be able to process
> with an unforgiving XML parser: http://tidy.sourceforge.net/


Yes, though parsing HTML into DOM is an unthankful job.

So regex is more practical, esp. when looking for particular things,
which is mostly the case. I use it occasionally to fetch a list
of URLs from HTML.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: [Jprogramming] Parsing HTML in J

Reply via email to