> Hi, > > I took a look to the recrawl script and noticed that all the steps except > urls injection are repeated at the consequent indexing and wondered why > would we generate new segments? Is it possible to do fetch, update for all > previous $s1..$sn , invertlink and index steps.
No, the generater generates a segment with a list of URL for the fetcher to fetch. You can, if you like, then merge segments. > > Thanks. > Alex. > > > > > > > -----Original Message----- > From: Julien Nioche <[email protected]> > To: user <[email protected]> > Sent: Wed, Jun 1, 2011 12:59 am > Subject: Re: keeping index up to date > > > You should use the adaptative fetch schedule. See > http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ > <http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20>for > details > > On 1 June 2011 07:18, <[email protected]> wrote: > > Hello, > > > > I use nutch-1.2 to index about 3000 sites. One of them has about 1500 pdf > > files which do not change over time. > > I wondered if there is a way of configuring nutch not to fetch unchanged > > documents again and again, but keep the old index for them. > > > > > > Thanks. > > Alex.

