Re: shouldFetch rejected

Sebastian Nagel Sat, 24 Nov 2012 07:37:16 -0800

> I want my crawler to crawl the complete page without setting up schedulers at 
> all. Every crawl
> process should crawl every page again without having setup wait intervals.


That's quite easy: remove all data and launch the crawl again.
- Nutch 1.x : remove crawldb, segments, and linkdb
- 2.x : drop 'webpage' (or similar, depends on the chosen data store)

On 11/24/2012 12:17 PM, Jan Philippe Wimmer wrote:
> Hi there,
> 
> how can i avoid the following error:
> -shouldFetch rejected 'http://www.page.com/shop', fetchTime=1356347311285, 
> curTime=1353755337755
> 
> I want my crawler to crawl the complete page without setting up schedulers at 
> all. Every crawl
> process should crawl every page again without having setup wait intervals.
> 
> Any solution?
> 
>

Re: shouldFetch rejected

Reply via email to