Thanks, I'll give that a shot! Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice // Guaranteed to be random } // xkcd.com
On Thu, Oct 29, 2009 at 5:53 AM, Andrzej Bialecki <a...@getopt.org> wrote: > Jesse Hires wrote: > >> I have a two datanode and one namenode setup. One of my datanodes is >> slower >> than the other, causing the fetch to run significantly longer on it. Is >> there a way to balance this out? >> > > Most likely the number of URLs/host is unbalanced, meaning that the > tasktracker that takes the longest is assigned a lot of URLs from a single > host. > > A workaround for this is to limit the max number of URLs per host (in > nutch-site.xml) to a more reasonable number, e.g. 100 or 1000, whatever > works best for you. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >