Lewis -- Looks like a duplicate of NUTCH-1284.  Sorry for not catching that
before posting.

-----Original Message-----
From: Lewis John Mcgibbney [mailto:[email protected]] 
Sent: Saturday, April 27, 2013 3:30 PM
To: [email protected]
Subject: Re: Nutch 1.6 Processing of fetcher.max.crawl.delay

Hi,
@Tejas, you will remember the work undertaken on NUTCH-1284 (the patch for
which you submitted included the fix for NUTCH-1042) relates to this.
I am not sure if the situations are identical, but they are closely linked
by the looks of it.
@ianin, can you look at the commentary and provide your input? Thank you so
much.
Also, one should note that this fix is not released yet, it is in trunk and
2.x branches which we will release hopefully soon.
Thanks
Lewis


On Sat, Apr 27, 2013 at 1:16 PM, Tejas Patil
<[email protected]>wrote:

> Thanks Iain for raising this. I will look into it. Can you kindly 
> share urls for which you see this behavior ? I can run a crawl with 
> those and try at my end.
>
>
> On Sat, Apr 27, 2013 at 1:13 PM, Iain Lopata <[email protected]> wrote:
>
> > Using Nutch 1.6, I am having a problem with the processing of 
> > fetcher.max.crawl.delay.
> >
> >
> >
> > The description for this property states that "If the Crawl-Delay in 
> > robots.txt is set to greater than this value (in seconds) then the
> fetcher
> > will skip this page, generating an error report. If set to -1 the 
> > fetcher will never skip such pages and will wait the amount of time 
> > retrieved
> from
> > robots.txt Crawl-Delay, however long that might be."
> >
> >
> >
> > I have found that the processing is not as stated when the value is 
> > set
> to
> > -1.  If I set the value of  fetcher.max.crawl.delay to -1, any URL 
> > on a site that has Crawl-Delay specified in the applicable section 
> > of robots.text
> is
> > rejected with a robots_denied(18).
> >
> >
> >
> > I am not a Java developer and I am completely new to using Nutch, 
> > but
> this
> > looks like it may be either a documentation error for the property 
> > or a problem with the logic in Fetcher.java  at Line 682.
> >
> >
> >
> > I can work around this by setting the property to some high value, 
> > but perhaps this is a problem that someone would like to look at.
> >
> >
> >
> > Happy to post in Jira if someone can confirm my assessment or if 
> > this is the right way to get this investigated.
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> >
>



--
*Lewis*

Reply via email to