Sean Snyders wrote:

> Hi Anyone ?
>
> I would like to know whether there exists an HTML Parser written in Java
> (source available or bin and interface info) I'm building an app and
> don't want to write one myself (saving time/ cutting costs etc)
>
> Can someone please direct me to site(s) that have such info/apps or can
> provide me with info of the like.
>
> Thanx,
>
> Sean.
>

An HTML parser that I have played with a little is available at
http://www.openxml.org -- you have to get the patched version to get one that
works.

You should also be aware that many parsers will have problems with much of the
badly formed HTML that is out there.  The fact that browsers are so forgiving
(too forgiving IMHO) is nice for someone writing HTML by hand, but creates
tremendous problems for programs that try to parse this stuff.  You might also
consider incorporating a program like the "tidy" utility (available at
http://www.w3.org) to pre-process the incoming HTML and try to clean up some
of these issues, before trying to parse it.

Craig McClanahan

___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".

Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html

Reply via email to