Hi,

I took a look to the  recrawl script and noticed that all the steps except urls 
injection are repeated at the consequent indexing and wondered why would we 
generate new segments?
Is it possible to do fetch, update for all previous $s1..$sn , invertlink  and 
index steps.

Thanks.
Alex.


 

 

-----Original Message-----
From: Julien Nioche <[email protected]>
To: user <[email protected]>
Sent: Wed, Jun 1, 2011 12:59 am
Subject: Re: keeping index up to date


You should use the adaptative fetch schedule. See
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
<http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20>for
details

On 1 June 2011 07:18, <[email protected]> wrote:

> Hello,
>
> I use nutch-1.2 to index about 3000 sites. One of them has about 1500 pdf
> files which do not change over time.
> I wondered if there is a way of configuring nutch not to fetch unchanged
> documents again and again, but keep the old index for them.
>
>
> Thanks.
> Alex.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

 

Reply via email to