Hi, Could you please explain more? What parameter? How can I do that?! Regards.
On Mon, Jun 2, 2014 at 3:42 AM, S.L <[email protected]> wrote: > Hi Ali > > Please see the nutch-site.xml parameters one of them does that. > > Sent from my HTC > > ----- Reply message ----- > From: "Ali Nazemian" <[email protected]> > To: <[email protected]> > Subject: Incremental crawling with nutch > Date: Sun, Jun 1, 2014 10:46 AM > > Hi everybody, > I am going to use nutch for crawling some news web site. These websites > will be updated regularly. Therefore I should recrawl them at least every 2 > hours. But the problem is I want to have incremental re-crawl, it means > nutch should crawl only the urls that are new and not fetched before > (except the main page of each site for extracting new urls). I want in each > re-crawling process only the new URLs fetched and send to solr for > indexing. Would somebody guide me through this scenario with nutch 1.8? > Best regards. > > -- > A.Nazemian > -- A.Nazemian

