OK, for starters we don't use crawl-urlfilter.txt anymore, this is
deprecated as of Nutch 1.2 iirc.

Secondly, what are you trying to achieve here? Your url filter includes
+^http://www
\.accessdata\.fda\.gov/scripts/cder/drugsatfda/index\.cfm\?fuseaction=Search\.SearchResults_Browse&DrugInitial=B$
+^http://www
\.accessdata\.fda\.gov/scripts/cder/drugsatfda/index\.cfm\?fuseaction=Search\.Overview&DrugName=BACIGUENT$

Your seed urls are also not exactly what I would expect for a seed list.

One last thing, your fetcher.threads.per.host is pretty aggressive, I
wouldn't personally set it this high unless it was my own server I was
communicating with.

So what exactly is it that you are having problems with?

Lewis



On Thu, Feb 23, 2012 at 12:11 PM, xuyuanme <xuyua...@gmail.com> wrote:

> Thanks! The config file can be get here:
> http://dl.dropbox.com/u/6614015/temp/config.zip
> http://dl.dropbox.com/u/6614015/temp/config.zip
>
>
> lewis john mcgibbney wrote
> >
> > Hi,
> >
> > Can you post your nutch-site.xml and I will give it a spin.
> >
> > Thank you
> >
> > Lewis
> >
> > On Thu, Feb 23, 2012 at 5:07 AM, xuyuanme &lt;xuyuanme@&gt; wrote:
> >
> >> Just checked the latest code in 1.4 but it's the same. See code line 138
> >> in
> >> below link:
> >>
> >>
> >>
> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup
> >>
> >>
> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup
> >>
> >> The method just call getResponse() and set followRedirects parameter to
> >> *false*.
> >>
> >> So I guess the http.redirect.max setting is not working on it?
> >>
> >>
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/http-redirect-max-tp3513652p3769491.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Reply via email to