Dhaval, You might find https://issues.apache.org/jira/browse/HBASE-4063 useful when it is ready. Of course, you can always use your own customized version of TableInputFormat. https://issues.apache.org/jira/browse/HBASE-4039 allows you to provide your own TableInputFormat to TableMapReduceUtil.
Ming -----Original Message----- From: Dhaval Makawana [mailto:[email protected]] Sent: Sunday, August 28, 2011 2:06 AM To: [email protected] Subject: Number of map jobs per region Hi, We have 31 regions for a table in our HBase system and hence while scanning the table via TableMapper, it creates 31 maps. Following is the line from documentation where I got the reason for the same. "Reading from HBase, the TableInputFormat asks HBase for the list of regions and makes a map-per-region or mapred.map.tasks maps, whichever is smaller " ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html ) Each region file size is almost 7 GB(lzo compressed data) and map jobs are taking huge time to processed the data. Is there any way to increase parallelism(allocate more maps per region)? Regards, Dhaval
