Numbers of mappers and reducers
Hi, I was going through FAQs on Hadoop to optimize the performance of map/reduce. There is a suggestion to set the number of reducers to a prime number closest to the number of nodes and number of mappers a prime number closest to several times the number of nodes in the cluster. What performance advantages do these numbers give? Obviously doing so improved the performance of my map reduce jobs considerably. Interested to know the principles behind it. Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: Numbers of mappers and reducers
On Mar 17, 2009, at 9:18 AM, Richa Khandelwal wrote: I was going through FAQs on Hadoop to optimize the performance of map/reduce. There is a suggestion to set the number of reducers to a prime number closest to the number of nodes and number of mappers a prime number closest to several times the number of nodes in the cluster. There is no need for the number of reduces to be prime. The only thing it helps is if you are using the HashPartitioner and your key's hash function is too linear. In practice, you usually want to use 99% of your reduce capacity of the cluster. -- Owen