Hi,

We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "
(
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html
)

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?

Regards,
Dhaval

Reply via email to