Hi,

Can you post your nutch-site.xml and I will give it a spin.

Thank you

Lewis

On Thu, Feb 23, 2012 at 5:07 AM, xuyuanme <xuyua...@gmail.com> wrote:

> Just checked the latest code in 1.4 but it's the same. See code line 138 in
> below link:
>
>
> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup
>
> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup
>
> The method just call getResponse() and set followRedirects parameter to
> *false*.
>
> So I guess the http.redirect.max setting is not working on it?
>
>
> remi tassing wrote
> >
> > Would you give Nucth-1.4 a try? Maybe this bug is already solved?
> >
> > Remi
> >
> > On Thursday, February 23, 2012, xuyuanme &lt;xuyuanme@&gt; wrote:
> >> Thanks for the information. But I found the wiki page
> >> http://wiki.apache.org/nutch/RedirectHandling
> >> http://wiki.apache.org/nutch/RedirectHandling  still doesn't have too
> >> much
> >> content about Nutch redirects.
> >>
> >> I found even if I set http.redirect.max=2 and
> >> db.ignore.external.links=false, the crawler still can't get redirect
> > pages.
> >> And with further digging, I found the plugin lib-http (in Nutch 1.1)
> >> contains following code:
> >>
> >> Java file: org.apache.nutch.protocol.http.api.HttpBase
> >>
> >>  public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum) {
> >> ......
> >>        response = getResponse(u, datum, */false/*); // make a request
> >> ......
> >>  }
> >>
> >>  protected abstract Response getResponse(URL url,
> >>                                          CrawlDatum datum,
> >>                                          boolean followRedirects)
> >>    throws ProtocolException, IOException;
> >>
> >> After I changed the call to getResponse(u, datum, */true/*) and
> recompile
> >> the plugin, the crawler fetches redirected pages as expected.
> >>
> >> So is this a bug in lib-http library or I had some misunderstanding on
> >> how
> >> redirect works?
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/http-redirect-max-tp3513652p3768744.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Reply via email to