Relationship between number of reducers and number of regions in the table

Shahab Yunus Thu, 14 Aug 2014 05:24:08 -0700

I couldn't decide that whether it is an HBase question or Hadoop/Yarn.

In the utility class for MR jobs integerated with HBase,
*org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, *


in the method:


*public static void initTableReducerJob(String table,*
*    Class<? extends TableReducer> reducer, Job job,*
*    Class partitioner, String quorumAddress, String serverClass,*
*    String serverImpl, boolean addDependencyJars) throws IOException;*

Im the above method the following check is added, while setting the number
of reducers that:


*...*
*int regions = outputTable.getRegionsInfo().size();*
*...*
*if (job.getNumReduceTasks() > regions) {*
*        job.setNumReduceTasks(outputTable.getRegionsInfo().size());*
*      }*
*...      *

What is the reason for doing this? And what are the negative effects we
don't follow this? I can think one that, in case of more than one reducer
writing/reading a same region can cause hot-spotting and performance
issues. Are there any other reasons to add this check as well?

Thanks a lot.

Regards,
Shahab

Relationship between number of reducers and number of regions in the table

Reply via email to