[Nutch-general] Crawling a page for links, but not indexing it

Dean Elwood Thu, 17 Nov 2005 09:35:01 -0800

I'm indexing a lot of pages which are archives - they contain both a link tothe original article, and part of the text of the original article.

So ideally I want to crawl the "parent" archive page and index everything itlinks to, but I don't actually want to index the "parent" page itself.


I hope that makes sense...

Is this possible? I'm using the intranet crawling method.

Many thanks,

Dean



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Crawling a page for links, but not indexing it

Reply via email to