Hi Ali,

Please check out this post [0] I found. I need to agree with the
response in the thread ans state that I don't know how Hadoop ensures
even distribution of workload but we can assume that by explicitly
specifying the mapper and reducers we can ensure that all 'will' be
used across your cluster.

hth

[0] http://stackoverflow.com/questions/5748585/hadoop-workload

On Tue, Jun 12, 2012 at 10:15 AM, Ali Safdar Kureishy
<[email protected]> wrote:
> Hi,
>
> I have a hadoop cluster of 5 nodes. I want to ensure that the fetch phase
> is distributed evenly across all the nodes (to maximize bandwidth etc).
> However, if I generate a fetchlist of size 1000 urls, does this get
> distributed equally across the nodes? Doesn't the fact that the size of the
> fetchlist is < 64MB (block size) result in it being fetched from a single
> node? If not, how is this distributed across the mappers evenly? Is there a
> rough formulate I can use, to determine how many URLs I should fetch for an
> equal distribution across my nodes, for a given block size setting?
>
> Thanks,
> Safdar



-- 
Lewis

Reply via email to