> I want my crawler to crawl the complete page without setting up schedulers at > all. Every crawl > process should crawl every page again without having setup wait intervals.
That's quite easy: remove all data and launch the crawl again. - Nutch 1.x : remove crawldb, segments, and linkdb - 2.x : drop 'webpage' (or similar, depends on the chosen data store) On 11/24/2012 12:17 PM, Jan Philippe Wimmer wrote: > Hi there, > > how can i avoid the following error: > -shouldFetch rejected 'http://www.page.com/shop', fetchTime=1356347311285, > curTime=1353755337755 > > I want my crawler to crawl the complete page without setting up schedulers at > all. Every crawl > process should crawl every page again without having setup wait intervals. > > Any solution? > >

