Hi,

all requests to the same host are processed in the same
fetch queue which also takes care that the configured
delay (or that specified in robots.txt) is observed.
With 10 threads and only one host to be crawled
9 of the threads are just doing nothing. Things are
different if there are multiple hosts to crawl (>=10).

Cheers,
Sebastian

On 01/06/2016 08:51 PM, Manish Verma wrote:
> Hi,
> I am using Nutch 1.10 and have some confusion over concurrency over crawl 
> deal.
> 
> For e.g 
> 
> fetcher.server.min.delay = .300
> fetcher.threads.per.queue = 10
> fetcher.queue.mode = byHost (for simplicity lets assume there is only one 
> host)
> 
> 
> 
> Now we have defined 10 threads, how this will behave, 10 request will be sent 
> to host same time or first thread will hit and then after 300 ms second 
> thread will hit.
> If thread can not hit at same time then whats the use of having multiple 
> threads as each thread has to wait 300 ms.
> 
> 
> Thanks MV
> 

Reply via email to