Just checked the latest code in 1.4 but it's the same. See code line 138 in below link:
http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup The method just call getResponse() and set followRedirects parameter to *false*. So I guess the http.redirect.max setting is not working on it? remi tassing wrote > > Would you give Nucth-1.4 a try? Maybe this bug is already solved? > > Remi > > On Thursday, February 23, 2012, xuyuanme <xuyuanme@> wrote: >> Thanks for the information. But I found the wiki page >> http://wiki.apache.org/nutch/RedirectHandling >> http://wiki.apache.org/nutch/RedirectHandling still doesn't have too >> much >> content about Nutch redirects. >> >> I found even if I set http.redirect.max=2 and >> db.ignore.external.links=false, the crawler still can't get redirect > pages. >> And with further digging, I found the plugin lib-http (in Nutch 1.1) >> contains following code: >> >> Java file: org.apache.nutch.protocol.http.api.HttpBase >> >> public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum) { >> ...... >> response = getResponse(u, datum, */false/*); // make a request >> ...... >> } >> >> protected abstract Response getResponse(URL url, >> CrawlDatum datum, >> boolean followRedirects) >> throws ProtocolException, IOException; >> >> After I changed the call to getResponse(u, datum, */true/*) and recompile >> the plugin, the crawler fetches redirected pages as expected. >> >> So is this a bug in lib-http library or I had some misunderstanding on >> how >> redirect works? > -- View this message in context: http://lucene.472066.n3.nabble.com/http-redirect-max-tp3513652p3768744.html Sent from the Nutch - User mailing list archive at Nabble.com.

