We have the injector for that ;)
> Hello, > > One more question. Is there a way of adding new urls to crawldb created in > previous crawls to include in subsequent recrawls? > > Thanks. > Alex. > > > > -----Original Message----- > From: lewis john mcgibbney <[email protected]> > To: user <[email protected]>; markus.jelsma > <[email protected]> Sent: Tue, Jun 7, 2011 1:16 pm > Subject: Re: keeping index up to date > > > Hi, > > To add to Markus' comments, if you take a look at the script it is written > in such a way that if run in safe mode it protects us against an error > which may occur. If this is the case we an recover segments etc and take > appropriate actions to resolve. > > On Tue, Jun 7, 2011 at 9:01 PM, Markus Jelsma <[email protected]>wrote: > > > Hi, > > > > > > I took a look to the recrawl script and noticed that all the steps > > > > except > > > > > urls injection are repeated at the consequent indexing and wondered why > > > would we generate new segments? Is it possible to do fetch, update for > > > > all > > > > > previous $s1..$sn , invertlink and index steps. > > > > No, the generater generates a segment with a list of URL for the fetcher > > to fetch. You can, if you like, then merge segments. > > > > > Thanks. > > > Alex. > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Julien Nioche <[email protected]> > > > To: user <[email protected]> > > > Sent: Wed, Jun 1, 2011 12:59 am > > > Subject: Re: keeping index up to date > > > > > > > > > You should use the adaptative fetch schedule. See > > > http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ > > > <http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20 > > > > > >for > > > > > > details > > > > > > On 1 June 2011 07:18, <[email protected]> wrote: > > > > Hello, > > > > > > > > I use nutch-1.2 to index about 3000 sites. One of them has about 1500 > > > > pdf > > > > > > files which do not change over time. > > > > I wondered if there is a way of configuring nutch not to fetch > > > > unchanged > > > > > > documents again and again, but keep the old index for them. > > > > > > > > > > > > Thanks. > > > > Alex.

