Re: nutch hadoop only one slave is crawling

Rémy Amouroux Wed, 23 May 2012 23:39:39 -0700

Hi

The fetcher threads (10 by default as configured in nutch-default.xml through 
fetcher.threads.fetch) are taking theirs jobs from a queue.
By default, there is only one queue per host (defined by property 
fetcher.queue.mode) and there is a property configuration limiting the number 
of threads allowed to access a queue (property fetcher.threads.per.queue, 
default is 1 in nutch-default.xml ) in order to be polite with the crawled web 
site.


So, you are crawling only one website, then you have one queue, and only one 
thread allowed to fetch at a given moment.

By modifying fetcher.threads.per.queue in nutch-site.xml, you can have more 
threads doing fetching at the same time, capped by fetcher.threads.fetch.

Regards

PS: be careful and think of the impact of the new configuration on this website 
:-)

RemyA

Le 24 mai 2012 à 06:12, Dustine Rene Bernasor a écrit :

> I have a 3-slaves hadoop cluster and I am performing a crawl on a single 
> website. However, only 1 slave is performing fetching (though the other 
> slaves are still alive). Is this normal behavior if only 1 domain is crawled? 
> Is there any way to force the other slaves to fetch?
> 
> Thanks.
>

Re: nutch hadoop only one slave is crawling

Reply via email to