Numbers of mappers and reducers

2009-03-17 Thread Richa Khandelwal
Hi,
I was going through FAQs on Hadoop to optimize the performance of
map/reduce. There is a suggestion to set the number of reducers to a prime
number closest to the number of nodes and number of mappers a prime number
closest to several times the number of nodes in the cluster.
What performance advantages do these numbers give? Obviously doing so
improved the performance of my map reduce jobs considerably. Interested to
know the principles behind it.

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: Numbers of mappers and reducers

2009-03-17 Thread Owen O'Malley


On Mar 17, 2009, at 9:18 AM, Richa Khandelwal wrote:


I was going through FAQs on Hadoop to optimize the performance of
map/reduce. There is a suggestion to set the number of reducers to a  
prime
number closest to the number of nodes and number of mappers a prime  
number

closest to several times the number of nodes in the cluster.


There is no need for the number of reduces to be prime. The only thing  
it helps is if you are using the HashPartitioner and your key's hash  
function is too linear. In practice, you usually want to use 99% of  
your reduce capacity of the cluster.


-- Owen