See also https://issues.apache.org/jira/browse/NUTCH-1939 (it's a bug in Nutch 1.9)
On 03/19/2015 10:10 PM, Sebastian Nagel wrote: > Hi Marko, > > even with > http.redirect.max == 0 > Nutch follows redirect but they are like ordinary links > recorded for fetch in the next round(s). > >> The first fetch seems to download something, but the second generate job >> doesn't appear to produce a new segment, > Are the redirect targets accepted by the URL filter patterns? > >> How can I look at the crawl db and segment data contents (esp. fetch list)? >> I'm running Nutch in local mode. > % bin/nutch readdb ... > % bin/nutch readseg ... > Help is shown when called without arguments. > > Best, > Sebastian > > On 03/18/2015 11:02 AM, Marko Asplund wrote: >> Hi, >> >> I'm a newbie having trouble getting Nutch 1.9 to crawl a site that does a >> HTTP 301 redirect from http/80 to https/443. >> Nutch fetch job issues the following message: >> >> redirect count exceeded http://www.foo.com/ >> >> and it seems that nothing actually gets fetched. >> I've set http.redirect.max parameter value to 50. >> >> I've only injected one seed URL to Nutch. >> The first fetch seems to download something, but the second generate job >> doesn't appear to produce a new segment, >> since there's only one segment in crawl DB after running it. >> >> How can I debug problem? >> >> Is there a way to make Nutch logging more verbose? I've set >> http.verbose, but that didn't help. >> >> How can I look at the crawl db and segment data contents (esp. fetch list)? >> I'm running Nutch in local mode. >> >> marko >> >

