> > If I set fetcher.threads.per.queue property to more than 1 , I believe the > behavior would be to have those many number of threads per host from Nutch, > in that case would Nutch still respect the Crawl-Delay directive in > robots.txt and not crawl at a faster pace that what is specified in > robots.txt. >
> In short what I am trying to ask is if setting fetcher.threads.per.queue > to 1 is required for being as polite as Crawl-Delay in robots.txt expects? > Using more than 1 thread per queue will ignore any crawl-delay obtained from robots.txt (see https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java#L317) and use the fetcher.server.min.delay configuration which has a default value of 0. So yes, setting fetcher.threads.per.queue to 1 is required for being as polite as Crawl-Delay in robots.txt expects. HTH Julien -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

