Perfect, thank you Julien!
On Thu, Jun 26, 2014 at 10:21 AM, Julien Nioche < [email protected]> wrote: > > > > If I set fetcher.threads.per.queue property to more than 1 , I believe > the > > behavior would be to have those many number of threads per host from > Nutch, > > in that case would Nutch still respect the Crawl-Delay directive in > > robots.txt and not crawl at a faster pace that what is specified in > > robots.txt. > > > > > In short what I am trying to ask is if setting fetcher.threads.per.queue > > to 1 is required for being as polite as Crawl-Delay in robots.txt > expects? > > > > Using more than 1 thread per queue will ignore any crawl-delay obtained > from robots.txt (see > > https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java#L317 > ) > and use the fetcher.server.min.delay configuration which has a default > value of 0. So yes, setting fetcher.threads.per.queue to 1 is required for > being as polite as Crawl-Delay in robots.txt expects. > > HTH > > Julien > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

