Re: nutch hadoop only one slave is crawling

[email protected] Thu, 24 May 2012 03:41:30 -0700

The queue runs on a single datanode?



Rémy Amouroux wrote
> 
> Hi
> 
> The fetcher threads (10 by default as configured in nutch-default.xml
> through fetcher.threads.fetch) are taking theirs jobs from a queue.
> By default, there is only one queue per host (defined by property
> fetcher.queue.mode) and there is a property configuration limiting the
> number of threads allowed to access a queue (property
> fetcher.threads.per.queue, default is 1 in nutch-default.xml ) in order to
> be polite with the crawled web site.
> 
> So, you are crawling only one website, then you have one queue, and only
> one thread allowed to fetch at a given moment.
> 
> By modifying fetcher.threads.per.queue in nutch-site.xml, you can have
> more threads doing fetching at the same time, capped by
> fetcher.threads.fetch.
> 
> Regards
> 
> PS: be careful and think of the impact of the new configuration on this
> website :-)
> 
> RemyA
> 
> Le 24 mai 2012 à 06:12, Dustine Rene Bernasor a écrit :
> 
>> I have a 3-slaves hadoop cluster and I am performing a crawl on a single
>> website. However, only 1 slave is performing fetching (though the other
>> slaves are still alive). Is this normal behavior if only 1 domain is
>> crawled? Is there any way to force the other slaves to fetch?
>> 
>> Thanks.
>>
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-hadoop-only-one-slave-is-crawling-tp3985825p3985886.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: nutch hadoop only one slave is crawling

Reply via email to