Problems with redirect handling: redirect count exceeded

Marko Asplund Wed, 18 Mar 2015 03:04:50 -0700

Hi,

I'm a newbie having trouble getting Nutch 1.9 to crawl a site that does a
HTTP 301 redirect from http/80 to https/443.
Nutch fetch job issues the following message:


redirect count exceeded http://www.foo.com/

and it seems that nothing actually gets fetched.
I've set http.redirect.max parameter value to 50.

I've only injected one seed URL to Nutch.
The first fetch seems to download something, but the second generate job
doesn't appear to produce a new segment,
since there's only one segment in crawl DB after running it.

How can I debug problem?

Is there a way to make Nutch logging more verbose? I've set
http.verbose, but that didn't help.

How can I look at the crawl db and segment data contents (esp. fetch list)?
I'm running Nutch in local mode.

marko

Problems with redirect handling: redirect count exceeded

Reply via email to