Re: Crawling a page for links, but not indexing it

Dean Elwood Thu, 17 Nov 2005 13:56:45 -0800

Hi - thanks Jake, but I'm not able to insert those tags into the pages.


Is there anyway that I can do this from the Nutch side?

Thanks,

Dean

----- Original Message -----From: "Vanderdray, Jacob" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Thursday, November 17, 2005 5:46 PM
Subject: RE: Crawling a page for links, but not indexing it

Dean,

I'm not sure if the nutch crawler actually supports it, but you
should be able to use a robots noindex Meta tag in the archive pages.

See http://www.robotstxt.org/wc/meta-user.html for more information.

Jake.

-----Original Message-----

From: Dean Elwood [mailto:[EMAIL PROTECTED]Sent: Thursday, November 17, 2005 12:34 PM

To: [email protected]
Subject: Crawling a page for links, but not indexing it

I'm indexing a lot of pages which are archives - they contain both a

link tothe original article, and part of the text of the original article.


So ideally I want to crawl the "parent" archive page and index

everything itlinks to, but I don't actually want to index the "parent" page itself.


I hope that makes sense...

Is this possible? I'm using the intranet crawling method.

Many thanks,

Dean

Re: Crawling a page for links, but not indexing it

Reply via email to