I'm indexing a lot of pages which are archives - they contain both a link to the original article, and part of the text of the original article.

So ideally I want to crawl the "parent" archive page and index everything it links to, but I don't actually want to index the "parent" page itself.

I hope that makes sense...

Is this possible? I'm using the intranet crawling method.

Many thanks,

Dean


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to