Hi Ali, Please check out this post [0] I found. I need to agree with the response in the thread ans state that I don't know how Hadoop ensures even distribution of workload but we can assume that by explicitly specifying the mapper and reducers we can ensure that all 'will' be used across your cluster.
hth [0] http://stackoverflow.com/questions/5748585/hadoop-workload On Tue, Jun 12, 2012 at 10:15 AM, Ali Safdar Kureishy <[email protected]> wrote: > Hi, > > I have a hadoop cluster of 5 nodes. I want to ensure that the fetch phase > is distributed evenly across all the nodes (to maximize bandwidth etc). > However, if I generate a fetchlist of size 1000 urls, does this get > distributed equally across the nodes? Doesn't the fact that the size of the > fetchlist is < 64MB (block size) result in it being fetched from a single > node? If not, how is this distributed across the mappers evenly? Is there a > rough formulate I can use, to determine how many URLs I should fetch for an > equal distribution across my nodes, for a given block size setting? > > Thanks, > Safdar -- Lewis

