Check out the HTMLParser.  If you see the main function around line
297 (on the trunk), the parse.getText() call will return exactly what
you wish.  Just follow the declarations to see how it's obtained.

On 3/23/07, Anton Beza <[EMAIL PROTECTED]> wrote:
> Does Nutch have the ability to filter out HTML tags from a web page and
> return the raw text from that page?
>
> Thanks
> -Anton
>


-- 


Ricardo J. Méndez
http://ricardo.strangevistas.net/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to