--- Stefano Lanzavecchia <[EMAIL PROTECTED]> wrote: > > I tried to use Oleg's XML parser, but no go: I'm not dealing with > > XHTML; > > Have you considered the idea of "fixing" the HTML before you process it? > There's a little piece of software called "HTML Tidy" which comes in various > flavours (executable, DLL, .NET DLL, Perl, Python, Java) that takes real > life HTML and turns it into XHTML which you should then be able to process > with an unforgiving XML parser: http://tidy.sourceforge.net/
Yes, though parsing HTML into DOM is an unthankful job. So regex is more practical, esp. when looking for particular things, which is mostly the case. I use it occasionally to fetch a list of URLs from HTML. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
