Dhaval,

You might find https://issues.apache.org/jira/browse/HBASE-4063 useful when it 
is ready. Of course, you can always use your own customized version of 
TableInputFormat. https://issues.apache.org/jira/browse/HBASE-4039 allows you 
to provide your own TableInputFormat to TableMapReduceUtil.

Ming

-----Original Message-----
From: Dhaval Makawana [mailto:[email protected]] 
Sent: Sunday, August 28, 2011 2:06 AM
To: [email protected]
Subject: Number of map jobs per region

Hi,

We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "
(
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html
)

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?

Regards,
Dhaval

Reply via email to