RE: Number of map jobs per region

Ma, Ming Sun, 28 Aug 2011 22:00:30 -0700

Dhaval,

You might find https://issues.apache.org/jira/browse/HBASE-4063 useful when it 
is ready. Of course, you can always use your own customized version of 
TableInputFormat. https://issues.apache.org/jira/browse/HBASE-4039 allows you 
to provide your own TableInputFormat to TableMapReduceUtil.

Ming

-----Original Message-----
From: Dhaval Makawana [mailto:[email protected]] 
Sent: Sunday, August 28, 2011 2:06 AM
To: [email protected]
Subject: Number of map jobs per region

Hi,

We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "
(
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html
)

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?

Regards,
Dhaval

RE: Number of map jobs per region

Reply via email to