http://java-source.net/open-source/html-parsers
"Mark Benussi" <[EMAIL PROTECTED]> 09/03/2005 04:24 AM Please respond to "Struts Users Mailing List" <user@struts.apache.org> To "'Struts Users Mailing List'" <user@struts.apache.org>, "'Tomcat Users List'" <tomcat-user@jakarta.apache.org> cc Subject [OT Friday] Parse HTML file to underlying text I know I missed the Friday deadline but... Has anyone any recommendations for parsing html. I use Lucene and the example has its own HTML parser but I was wondering if anyone has used an existing project or whether there is some built in functionality in an Apache lib to convert <p>Hello <i>World</i></p> To Hello World Your thoughts are appreciated.