brainstorm wrote:
Sure, I tried with mapred.map.tasks and mapred.reduce.tasks with
values 2 and 1 respectively *in the past*, same results. Right now, I
have 32 for both: same results as those settings are just a hint for
nutch.
Regarding number of threads *per host* I tried with 10 and 20 in the
past, same results.
Indeed, the default number of maps and reduces can be changed for any
particular job - the number of maps is adjusted according to the number
of input splits (InputFormat.getSplits()), and the number of reduces can
be adjusted programmatically in the application.
Back to your issue: I suspect that your fetchlist is highly homogenous,
i.e. contains urls from a single host. Nutch makes sure that all urls
from a single host end up in a single map task, to ensure the politeness
settings, so that's probably why you see only a single map task fetching
all urls.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com