I've checked working with redirects and everything seems to work fine for me.
The site I checked on http://www.scotland.gov.uk temp redirect to http://home.scotland.gov.uk/home Nutch gets this fine when I do some tweaking with nutch-site.xml redirects property -1 (just to demonstrate, I would usually not set it so) Lewis On Thu, Feb 23, 2012 at 3:18 PM, Lewis John Mcgibbney < [email protected]> wrote: > Additionally in your nutch-site.xml we don't maintain any query-(plugins), > and there is no parse-text plugin either. > > > On Thu, Feb 23, 2012 at 3:13 PM, Lewis John Mcgibbney < > [email protected]> wrote: > >> OK, for starters we don't use crawl-urlfilter.txt anymore, this is >> deprecated as of Nutch 1.2 iirc. >> >> Secondly, what are you trying to achieve here? Your url filter includes >> +^http://www >> \.accessdata\.fda\.gov/scripts/cder/drugsatfda/index\.cfm\?fuseaction=Search\.SearchResults_Browse&DrugInitial=B$ >> +^http://www >> \.accessdata\.fda\.gov/scripts/cder/drugsatfda/index\.cfm\?fuseaction=Search\.Overview&DrugName=BACIGUENT$ >> >> Your seed urls are also not exactly what I would expect for a seed list. >> >> One last thing, your fetcher.threads.per.host is pretty aggressive, I >> wouldn't personally set it this high unless it was my own server I was >> communicating with. >> >> So what exactly is it that you are having problems with? >> >> Lewis >> >> >> >> >> On Thu, Feb 23, 2012 at 12:11 PM, xuyuanme <[email protected]> wrote: >> >>> Thanks! The config file can be get here: >>> http://dl.dropbox.com/u/6614015/temp/config.zip >>> http://dl.dropbox.com/u/6614015/temp/config.zip >>> >>> >>> lewis john mcgibbney wrote >>> > >>> > Hi, >>> > >>> > Can you post your nutch-site.xml and I will give it a spin. >>> > >>> > Thank you >>> > >>> > Lewis >>> > >>> > On Thu, Feb 23, 2012 at 5:07 AM, xuyuanme <xuyuanme@> wrote: >>> > >>> >> Just checked the latest code in 1.4 but it's the same. See code line >>> 138 >>> >> in >>> >> below link: >>> >> >>> >> >>> >> >>> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup >>> >> >>> >> >>> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup >>> >> >>> >> The method just call getResponse() and set followRedirects parameter >>> to >>> >> *false*. >>> >> >>> >> So I guess the http.redirect.max setting is not working on it? >>> >> >>> >> >>> > >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/http-redirect-max-tp3513652p3769491.html >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >> >> >> >> -- >> *Lewis* >> >> > > > -- > *Lewis* > > -- *Lewis*

