Hi,
I took a look to the recrawl script and noticed that all the steps except urls injection are repeated at the consequent indexing and wondered why would we generate new segments? Is it possible to do fetch, update for all previous $s1..$sn , invertlink and index steps. Thanks. Alex. -----Original Message----- From: Julien Nioche <[email protected]> To: user <[email protected]> Sent: Wed, Jun 1, 2011 12:59 am Subject: Re: keeping index up to date You should use the adaptative fetch schedule. See http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ <http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20>for details On 1 June 2011 07:18, <[email protected]> wrote: > Hello, > > I use nutch-1.2 to index about 3000 sites. One of them has about 1500 pdf > files which do not change over time. > I wondered if there is a way of configuring nutch not to fetch unchanged > documents again and again, but keep the old index for them. > > > Thanks. > Alex. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

