Hi Marko, even with http.redirect.max == 0 Nutch follows redirect but they are like ordinary links recorded for fetch in the next round(s).
> The first fetch seems to download something, but the second generate job > doesn't appear to produce a new segment, Are the redirect targets accepted by the URL filter patterns? > How can I look at the crawl db and segment data contents (esp. fetch list)? > I'm running Nutch in local mode. % bin/nutch readdb ... % bin/nutch readseg ... Help is shown when called without arguments. Best, Sebastian On 03/18/2015 11:02 AM, Marko Asplund wrote: > Hi, > > I'm a newbie having trouble getting Nutch 1.9 to crawl a site that does a > HTTP 301 redirect from http/80 to https/443. > Nutch fetch job issues the following message: > > redirect count exceeded http://www.foo.com/ > > and it seems that nothing actually gets fetched. > I've set http.redirect.max parameter value to 50. > > I've only injected one seed URL to Nutch. > The first fetch seems to download something, but the second generate job > doesn't appear to produce a new segment, > since there's only one segment in crawl DB after running it. > > How can I debug problem? > > Is there a way to make Nutch logging more verbose? I've set > http.verbose, but that didn't help. > > How can I look at the crawl db and segment data contents (esp. fetch list)? > I'm running Nutch in local mode. > > marko >

