Jesse Hires wrote:
I have a two datanode and one namenode setup. One of my datanodes is slower than the other, causing the fetch to run significantly longer on it. Is there a way to balance this out?
Most likely the number of URLs/host is unbalanced, meaning that the tasktracker that takes the longest is assigned a lot of URLs from a single host.
A workaround for this is to limit the max number of URLs per host (in nutch-site.xml) to a more reasonable number, e.g. 100 or 1000, whatever works best for you.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com