Try set hbase.client.scanner.caching=5000;
Also, check to make sure that you are getting the expected locality so that mappers are running on the same nodes as the region servers they are scanning (assuming that you are running HBase and mapreduce on the same cluster). When I was testing this, I encountered this problem (but it may have been specific to our cluster configurations): https://issues.apache.org/jira/browse/HBASE-2535 JVS On Dec 9, 2010, at 10:46 PM, vlisovsky wrote: > > Hi Guys, > Wonder if anybody could shed some light on how to reduce the load on HBase > cluster when running a full scan. > The need is to dump everything I have in HBase and into a Hive table. The > HBase data size is around 500g. > The job creates 9000 mappers, after about 1000 maps things go south every > time.. > If I run below insert it runs for about 30 minutes then starts bringing down > HBase cluster after which region servers need to be restarted.. > Wonder if there is a way to throttle it somehow or otherwise if there is any > other method of getting structured data out? > Any help is appreciated, > Thanks, > -Vitaly > > create external table hbase_linked_table ( > mykey string, > info map<string, string>, > ) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH > SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:") > TBLPROPERTIES ("hbase.table.name" = "hbase_table2"); > > set hive.exec.compress.output=true; > set io.seqfile.compression.type=BLOCK; > set mapred.output.compression.type=BLOCK; > set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; > > set mapred.reduce.tasks=40; > set mapred.map.tasks=25; > > INSERT overwrite table tmp_hive_destination > select * from hbase_linked_table; >