I've checked working with redirects and everything seems to work fine for
me.

The site I checked on

http://www.scotland.gov.uk

temp redirect to

http://home.scotland.gov.uk/home

Nutch gets this fine when I do some tweaking with nutch-site.xml

redirects property -1 (just to demonstrate, I would usually not set it so)

Lewis

On Thu, Feb 23, 2012 at 3:18 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Additionally in your nutch-site.xml we don't maintain any query-(plugins),
> and there is no parse-text plugin either.
>
>
> On Thu, Feb 23, 2012 at 3:13 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> OK, for starters we don't use crawl-urlfilter.txt anymore, this is
>> deprecated as of Nutch 1.2 iirc.
>>
>> Secondly, what are you trying to achieve here? Your url filter includes
>> +^http://www
>> \.accessdata\.fda\.gov/scripts/cder/drugsatfda/index\.cfm\?fuseaction=Search\.SearchResults_Browse&DrugInitial=B$
>> +^http://www
>> \.accessdata\.fda\.gov/scripts/cder/drugsatfda/index\.cfm\?fuseaction=Search\.Overview&DrugName=BACIGUENT$
>>
>> Your seed urls are also not exactly what I would expect for a seed list.
>>
>> One last thing, your fetcher.threads.per.host is pretty aggressive, I
>> wouldn't personally set it this high unless it was my own server I was
>> communicating with.
>>
>> So what exactly is it that you are having problems with?
>>
>> Lewis
>>
>>
>>
>>
>> On Thu, Feb 23, 2012 at 12:11 PM, xuyuanme <[email protected]> wrote:
>>
>>> Thanks! The config file can be get here:
>>> http://dl.dropbox.com/u/6614015/temp/config.zip
>>> http://dl.dropbox.com/u/6614015/temp/config.zip
>>>
>>>
>>> lewis john mcgibbney wrote
>>> >
>>> > Hi,
>>> >
>>> > Can you post your nutch-site.xml and I will give it a spin.
>>> >
>>> > Thank you
>>> >
>>> > Lewis
>>> >
>>> > On Thu, Feb 23, 2012 at 5:07 AM, xuyuanme &lt;xuyuanme@&gt; wrote:
>>> >
>>> >> Just checked the latest code in 1.4 but it's the same. See code line
>>> 138
>>> >> in
>>> >> below link:
>>> >>
>>> >>
>>> >>
>>> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup
>>> >>
>>> >>
>>> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java?view=markup
>>> >>
>>> >> The method just call getResponse() and set followRedirects parameter
>>> to
>>> >> *false*.
>>> >>
>>> >> So I guess the http.redirect.max setting is not working on it?
>>> >>
>>> >>
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/http-redirect-max-tp3513652p3769491.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>>
>
>
> --
> *Lewis*
>
>


-- 
*Lewis*

Reply via email to