Hi Andrzej,

I am experiencing similar problems distributing the fetch across multiple
nodes. I am crawling a single host in an intranet and I would like to know
how I can modify nutch's behavior so that it distributes the search over
multiple nodes.

Soila

Andrzej Bialecki wrote:
> 
> brainstorm wrote:
>> Sure, I tried with mapred.map.tasks and mapred.reduce.tasks with
>> values 2 and 1 respectively *in the past*, same results. Right now, I
>> have 32 for both: same results as those settings are just a hint for
>> nutch.
>> 
>> Regarding number of threads *per host* I tried with 10 and 20 in the
>> past, same results.
> 
> Indeed, the default number of maps and reduces can be changed for any 
> particular job - the number of maps is adjusted according to the number 
> of input splits (InputFormat.getSplits()), and the number of reduces can 
> be adjusted programmatically in the application.
> 
> Back to your issue: I suspect that your fetchlist is highly homogenous, 
> i.e. contains urls from a single host. Nutch makes sure that all urls 
> from a single host end up in a single map task, to ensure the politeness 
> settings, so that's probably why you see only a single map task fetching 
> all urls.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Distributed-fetching-only-happening-in-one-node---tp18429531p18915705.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to