Hi Marko,

even with
  http.redirect.max == 0
Nutch follows redirect but they are like ordinary links
recorded for fetch in the next round(s).

> The first fetch seems to download something, but the second generate job
> doesn't appear to produce a new segment,
Are the redirect targets accepted by the URL filter patterns?

> How can I look at the crawl db and segment data contents (esp. fetch list)?
> I'm running Nutch in local mode.
% bin/nutch readdb ...
% bin/nutch readseg ...
Help is shown when called without arguments.

Best,
Sebastian

On 03/18/2015 11:02 AM, Marko Asplund wrote:
> Hi,
> 
> I'm a newbie having trouble getting Nutch 1.9 to crawl a site that does a
> HTTP 301 redirect from http/80 to https/443.
> Nutch fetch job issues the following message:
> 
> redirect count exceeded http://www.foo.com/
> 
> and it seems that nothing actually gets fetched.
> I've set http.redirect.max parameter value to 50.
> 
> I've only injected one seed URL to Nutch.
> The first fetch seems to download something, but the second generate job
> doesn't appear to produce a new segment,
> since there's only one segment in crawl DB after running it.
> 
> How can I debug problem?
> 
> Is there a way to make Nutch logging more verbose? I've set
> http.verbose, but that didn't help.
> 
> How can I look at the crawl db and segment data contents (esp. fetch list)?
> I'm running Nutch in local mode.
> 
> marko
> 

Reply via email to