Hi Ali, Nutch does not really re-crawl, it crawls every URL every N interval, default of 30 days. Usually one would keep Nutch running indefinately (e.g. by cron), the URL's will then automatically be `recrawled` every 30 days by default.
Markus -----Original message----- > From:Ali Nazemian <[email protected]> > Sent: Thursday 5th June 2014 21:25 > To: [email protected] > Subject: re-crawling with nutch 1.8 > > Hi, > I recently got familiar with nutch and I want to use nutch for whole web > crawling. The problem is I did not find any useful tutorial on how to > re-crawl using nutch. I know that there is some configuration parameter > that should change for purpose of recrawling, I am aware of them. The thing > that I dont know is how can I run a crawler for crawl as first step and > recrawl as the next steps? As far as I found out the default crawl script > that is provided with nutch could not be used for my purpose. Could > somebody tell me how can I do that? What are the prerequisites? Do I need > web application server such as tomcat for this purpose? > FYI I am using nutch 1.8. > > Regards. > > -- > A.Nazemian >

