Hi Senthilkumar, In short, search recrawl from the Nutch wiki to find an external blog post on recrawling with Nutch. If you have anything to add to the post contact the author. If on the other hand you need clarification on anything then ping us here
Hth Lewis On Thursday, April 18, 2013, mesenthil1 < [email protected]> wrote: > Hi, > > Can some one please explain how the following scenario works? > > I need to crawl a site with 50K urls. This site is a dynamic site and will > have frequent updates on the site. Assuming it takes 2 days to completely > crawl this site, can we have some configuration(fetch schedule or something > else) so that once the crawl cycle is complete, the next crawl cycle will > start automatically after two days to find the new URLS. If this feature is > not available, should we manually control the repeated crawling of the site > thru some sort of scripting? > > Actually we will have to crawl more than 50 sites to be crawled separately. > If we need to maintain re-crawling of each site, should we have 50 separate > scripts to handle them. Please let us know if anyone has faced this > situation? > > > Thanks, > Senthil > > > > > > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Whether-Nutch-AdaptiveFetchSchedule-can-do-recrawling-automatically-tp4056979p4057036.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- *Lewis*

