Re: Hive HBase intergration scan failing

John Sichi Thu, 09 Dec 2010 23:10:14 -0800

Try

set hbase.client.scanner.caching=5000;


Also, check to make sure that you are getting the expected locality so that 
mappers are running on the same nodes as the region servers they are scanning 
(assuming that you are running HBase and mapreduce on the same cluster).  When 
I was testing this, I encountered this problem (but it may have been specific 
to our cluster configurations):

https://issues.apache.org/jira/browse/HBASE-2535

JVS

On Dec 9, 2010, at 10:46 PM, vlisovsky wrote:

> 
> Hi Guys,
> Wonder if  anybody could shed some light on how to reduce the load on HBase 
> cluster when running a full scan.
> The need is to dump everything I have in HBase and into a Hive table. The 
> HBase data size is around 500g. 
> The job creates 9000 mappers, after about 1000 maps things go south every 
> time..
> If I run below insert it runs for about 30 minutes then starts bringing down 
> HBase cluster after which region servers need to be restarted..
> Wonder if there is a way to throttle it somehow or otherwise if there is any 
> other method of getting structured data out?
> Any help is appreciated,
> Thanks,
> -Vitaly
> 
> create external table hbase_linked_table (
> mykey        string,
> info        map<string, string>,
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH 
> SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
> TBLPROPERTIES ("hbase.table.name" = "hbase_table2");
> 
> set hive.exec.compress.output=true;
> set io.seqfile.compression.type=BLOCK;
> set mapred.output.compression.type=BLOCK;
> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
> 
> set mapred.reduce.tasks=40;
> set mapred.map.tasks=25;
> 
> INSERT overwrite table tmp_hive_destination
> select * from hbase_linked_table;
>

Re: Hive HBase intergration scan failing

Reply via email to