Perfect, thank you Julien!

On Thu, Jun 26, 2014 at 10:21 AM, Julien Nioche <
[email protected]> wrote:

> >
> > If I set fetcher.threads.per.queue property to more than 1 , I believe
> the
> > behavior would be to have those many number of threads per host from
> Nutch,
> > in that case would Nutch still respect the Crawl-Delay directive in
> > robots.txt and not crawl at a faster pace that what is specified in
> > robots.txt.
> >
>
> > In short what I am trying to ask is if setting fetcher.threads.per.queue
> > to 1 is required for being as polite as Crawl-Delay in robots.txt
> expects?
> >
>
> Using more than 1 thread per queue will ignore any crawl-delay obtained
> from robots.txt (see
>
> https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java#L317
> )
> and use the fetcher.server.min.delay configuration which has a default
> value of 0. So yes, setting fetcher.threads.per.queue to 1 is required for
> being as polite as Crawl-Delay in robots.txt expects.
>
> HTH
>
> Julien
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to