Check out the HTMLParser. If you see the main function around line 297 (on the trunk), the parse.getText() call will return exactly what you wish. Just follow the declarations to see how it's obtained.
On 3/23/07, Anton Beza <[EMAIL PROTECTED]> wrote: > Does Nutch have the ability to filter out HTML tags from a web page and > return the raw text from that page? > > Thanks > -Anton > -- Ricardo J. Méndez http://ricardo.strangevistas.net/ ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
