Hi, We have 31 regions for a table in our HBase system and hence while scanning the table via TableMapper, it creates 31 maps. Following is the line from documentation where I got the reason for the same.
"Reading from HBase, the TableInputFormat asks HBase for the list of regions and makes a map-per-region or mapred.map.tasks maps, whichever is smaller " ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html ) Each region file size is almost 7 GB(lzo compressed data) and map jobs are taking huge time to processed the data. Is there any way to increase parallelism(allocate more maps per region)? Regards, Dhaval
