Thanks Lewis and Julien, for your inputs. I will look into this a bit further and reply with some numbers, as seen with a fetchlist of 50K urls. It is late night here in my timezone, so will look at this first thing in the morning.
Thanks, Safdar On Tue, Jun 12, 2012 at 4:56 PM, Julien Nioche < [email protected]> wrote: > Guys, > > This has to do with the way URLs are grouped for politeness and not so much > with the number of blocks in the input. Limiting the URLs by # host names, > domains or IP is a way of ensuring an even distribution across the cluster. > See nutch-default.xml for details > > J. > > > On 12 June 2012 13:06, Lewis John Mcgibbney <[email protected] > >wrote: > > > Hi Ali, > > > > Please check out this post [0] I found. I need to agree with the > > response in the thread ans state that I don't know how Hadoop ensures > > even distribution of workload but we can assume that by explicitly > > specifying the mapper and reducers we can ensure that all 'will' be > > used across your cluster. > > > > hth > > > > [0] http://stackoverflow.com/questions/5748585/hadoop-workload > > > > On Tue, Jun 12, 2012 at 10:15 AM, Ali Safdar Kureishy > > <[email protected]> wrote: > > > Hi, > > > > > > I have a hadoop cluster of 5 nodes. I want to ensure that the fetch > phase > > > is distributed evenly across all the nodes (to maximize bandwidth etc). > > > However, if I generate a fetchlist of size 1000 urls, does this get > > > distributed equally across the nodes? Doesn't the fact that the size of > > the > > > fetchlist is < 64MB (block size) result in it being fetched from a > single > > > node? If not, how is this distributed across the mappers evenly? Is > > there a > > > rough formulate I can use, to determine how many URLs I should fetch > for > > an > > > equal distribution across my nodes, for a given block size setting? > > > > > > Thanks, > > > Safdar > > > > > > > > -- > > Lewis > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

