Thank you very much. But it is just a parameter for specifying the interval between re-crawls. The problem is nutch re-crawl does not works with default crawl script.
On Wed, Jun 4, 2014 at 6:49 PM, S.L <[email protected]> wrote: > Ali, > > If you have not found this out yet, I was referring to > db.fetch.interval.max. > > Sent from my HTC > > ----- Reply message ----- > From: "Ali Nazemian" <[email protected]> > To: <[email protected]> > Subject: Incremental crawling with nutch > Date: Mon, Jun 2, 2014 4:52 AM > > Hi, > Could you please explain more? > What parameter? How can I do that?! > Regards. > > > On Mon, Jun 2, 2014 at 3:42 AM, S.L <[email protected]> wrote: > > > Hi Ali > > > > Please see the nutch-site.xml parameters one of them does that. > > > > Sent from my HTC > > > > ----- Reply message ----- > > From: "Ali Nazemian" <[email protected]> > > To: <[email protected]> > > Subject: Incremental crawling with nutch > > Date: Sun, Jun 1, 2014 10:46 AM > > > > Hi everybody, > > I am going to use nutch for crawling some news web site. These websites > > will be updated regularly. Therefore I should recrawl them at least > every 2 > > hours. But the problem is I want to have incremental re-crawl, it means > > nutch should crawl only the urls that are new and not fetched before > > (except the main page of each site for extracting new urls). I want in > each > > re-crawling process only the new URLs fetched and send to solr for > > indexing. Would somebody guide me through this scenario with nutch 1.8? > > Best regards. > > > > -- > > A.Nazemian > > > > > > -- > A.Nazemian > -- A.Nazemian

