Thanks for the info. Moreover how can we make sure that our regionservers are running with same Datanodes ( locality). Is there a way we can make sure?
On Thu, Dec 9, 2010 at 11:09 PM, John Sichi <jsi...@fb.com> wrote: > Try > > set hbase.client.scanner.caching=5000; > > Also, check to make sure that you are getting the expected locality so that > mappers are running on the same nodes as the region servers they are > scanning (assuming that you are running HBase and mapreduce on the same > cluster). When I was testing this, I encountered this problem (but it may > have been specific to our cluster configurations): > > https://issues.apache.org/jira/browse/HBASE-2535 > > JVS > > On Dec 9, 2010, at 10:46 PM, vlisovsky wrote: > > > > > Hi Guys, > > Wonder if anybody could shed some light on how to reduce the load on > HBase cluster when running a full scan. > > The need is to dump everything I have in HBase and into a Hive table. The > HBase data size is around 500g. > > The job creates 9000 mappers, after about 1000 maps things go south every > time.. > > If I run below insert it runs for about 30 minutes then starts bringing > down HBase cluster after which region servers need to be restarted.. > > Wonder if there is a way to throttle it somehow or otherwise if there is > any other method of getting structured data out? > > Any help is appreciated, > > Thanks, > > -Vitaly > > > > create external table hbase_linked_table ( > > mykey string, > > info map<string, string>, > > ) > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > WITH > > SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:") > > TBLPROPERTIES ("hbase.table.name" = "hbase_table2"); > > > > set hive.exec.compress.output=true; > > set io.seqfile.compression.type=BLOCK; > > set mapred.output.compression.type=BLOCK; > > set > mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; > > > > set mapred.reduce.tasks=40; > > set mapred.map.tasks=25; > > > > INSERT overwrite table tmp_hive_destination > > select * from hbase_linked_table; > > > >