You have to build a special HTML Junk parser.

2007/3/9, d e <[EMAIL PROTECTED]>:

If I'm indexing a news article, I want to avoid getting the junk (other
than
the title, auther and article) into the index. I want to avoid getting the
advertizments, etc. How do I do that sort of thing?

What parts of what manual should I be reading so I will know how to do
this
sort of thing.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to