Jesse Hires wrote:
I have a two datanode and one namenode setup. One of my datanodes is slower
than the other, causing the fetch to run significantly longer on it. Is
there a way to balance this out?

Most likely the number of URLs/host is unbalanced, meaning that the tasktracker that takes the longest is assigned a lot of URLs from a single host.

A workaround for this is to limit the max number of URLs per host (in nutch-site.xml) to a more reasonable number, e.g. 100 or 1000, whatever works best for you.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to