So you mean the only difference(beside some parameter that should be set in site-nutch.xml is using nutch generate -adddays instead of nutch generate? what about other parts?) Could you please provide step by step guide? Regards.
On Sat, Jun 7, 2014 at 4:20 PM, Bayu Widyasanyata <bwidyasany...@gmail.com> wrote: > Hi Ali, > > OK, I will share using my current script. > I sometimes use "-adddays" parameter on "nutch generate" steps to force > recrawling. > > Thanks. > > > On Fri, Jun 6, 2014 at 11:02 PM, Ali Nazemian <alinazem...@gmail.com> > wrote: > > > Dear Bayu, > > Would you please also provide me what procedure you are going to use for > > testing recrawl? maybe I do some steps wrong. > > Regards. > > > > > > On Fri, Jun 6, 2014 at 7:01 PM, Bayu Widyasanyata < > bwidyasany...@gmail.com > > > > > wrote: > > > > > Just curious, I will go back in lab and proof it.... > > > > > > --- > > > wassalam, > > > [bayu] > > > > > > /sent from Android phone/ > > > On Jun 6, 2014 5:37 PM, "Ali Nazemian" <alinazem...@gmail.com> wrote: > > > > > > > Dear Bayu, > > > > Hi, > > > > I already read that post about recrawling. My problem is nutch does > not > > > > works in the same way that this post mentioned. > > > > Regards. > > > > > > > > > > > > On Fri, Jun 6, 2014 at 2:14 PM, Bayu Widyasanyata < > > > bwidyasany...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi Ali, > > > > > > > > > > This blog [0] may helps. > > > > > > > > > > [0] > > http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ > > > > > > > > > > > > > > > On Thu, Jun 5, 2014 at 12:32 AM, Ali Nazemian < > alinazem...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > Thank you very much. But it is just a parameter for specifying > the > > > > > interval > > > > > > between re-crawls. The problem is nutch re-crawl does not works > > with > > > > > > default crawl script. > > > > > > > > > > > > > > > > > > On Wed, Jun 4, 2014 at 6:49 PM, S.L <simpleliving...@gmail.com> > > > wrote: > > > > > > > > > > > > > Ali, > > > > > > > > > > > > > > If you have not found this out yet, I was referring to > > > > > > > db.fetch.interval.max. > > > > > > > > > > > > > > Sent from my HTC > > > > > > > > > > > > > > ----- Reply message ----- > > > > > > > From: "Ali Nazemian" <alinazem...@gmail.com> > > > > > > > To: <user@nutch.apache.org> > > > > > > > Subject: Incremental crawling with nutch > > > > > > > Date: Mon, Jun 2, 2014 4:52 AM > > > > > > > > > > > > > > Hi, > > > > > > > Could you please explain more? > > > > > > > What parameter? How can I do that?! > > > > > > > Regards. > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 2, 2014 at 3:42 AM, S.L <simpleliving...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > Hi Ali > > > > > > > > > > > > > > > > Please see the nutch-site.xml parameters one of them does > that. > > > > > > > > > > > > > > > > Sent from my HTC > > > > > > > > > > > > > > > > ----- Reply message ----- > > > > > > > > From: "Ali Nazemian" <alinazem...@gmail.com> > > > > > > > > To: <user@nutch.apache.org> > > > > > > > > Subject: Incremental crawling with nutch > > > > > > > > Date: Sun, Jun 1, 2014 10:46 AM > > > > > > > > > > > > > > > > Hi everybody, > > > > > > > > I am going to use nutch for crawling some news web site. > These > > > > > websites > > > > > > > > will be updated regularly. Therefore I should recrawl them at > > > least > > > > > > > every 2 > > > > > > > > hours. But the problem is I want to have incremental > re-crawl, > > it > > > > > means > > > > > > > > nutch should crawl only the urls that are new and not fetched > > > > before > > > > > > > > (except the main page of each site for extracting new urls). > I > > > want > > > > > in > > > > > > > each > > > > > > > > re-crawling process only the new URLs fetched and send to > solr > > > for > > > > > > > > indexing. Would somebody guide me through this scenario with > > > nutch > > > > > 1.8? > > > > > > > > Best regards. > > > > > > > > > > > > > > > > -- > > > > > > > > A.Nazemian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > A.Nazemian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > A.Nazemian > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > wassalam, > > > > > [bayu] > > > > > > > > > > > > > > > > > > > > > -- > > > > A.Nazemian > > > > > > > > > > > > > > > -- > > A.Nazemian > > > > > > -- > wassalam, > [bayu] > -- A.Nazemian