>
> If I set fetcher.threads.per.queue property to more than 1 , I believe the
> behavior would be to have those many number of threads per host from Nutch,
> in that case would Nutch still respect the Crawl-Delay directive in
> robots.txt and not crawl at a faster pace that what is specified in
> robots.txt.
>

> In short what I am trying to ask is if setting fetcher.threads.per.queue
> to 1 is required for being as polite as Crawl-Delay in robots.txt expects?
>

Using more than 1 thread per queue will ignore any crawl-delay obtained
from robots.txt (see
https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java#L317)
and use the fetcher.server.min.delay configuration which has a default
value of 0. So yes, setting fetcher.threads.per.queue to 1 is required for
being as polite as Crawl-Delay in robots.txt expects.

HTH

Julien

-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to