http://java-source.net/open-source/html-parsers





"Mark Benussi" <[EMAIL PROTECTED]> 
09/03/2005 04:24 AM
Please respond to
"Struts Users Mailing List" <user@struts.apache.org>


To
"'Struts Users Mailing List'" <user@struts.apache.org>, "'Tomcat Users 
List'" <tomcat-user@jakarta.apache.org>
cc

Subject
[OT Friday] Parse HTML file to underlying text






I know I missed the Friday deadline but...

 

Has anyone any recommendations for parsing html. I use Lucene and the
example has its own HTML parser but I was wondering if anyone has used an
existing project or whether there is some built in functionality in an
Apache lib to convert

 

<p>Hello <i>World</i></p>

 

To

 

Hello World

 

Your thoughts are appreciated.


Reply via email to