Thanks for the info. Moreover how can we make sure that our regionservers
are running with same Datanodes ( locality). Is there a way we can make
sure?

On Thu, Dec 9, 2010 at 11:09 PM, John Sichi <jsi...@fb.com> wrote:

> Try
>
> set hbase.client.scanner.caching=5000;
>
> Also, check to make sure that you are getting the expected locality so that
> mappers are running on the same nodes as the region servers they are
> scanning (assuming that you are running HBase and mapreduce on the same
> cluster).  When I was testing this, I encountered this problem (but it may
> have been specific to our cluster configurations):
>
> https://issues.apache.org/jira/browse/HBASE-2535
>
> JVS
>
> On Dec 9, 2010, at 10:46 PM, vlisovsky wrote:
>
> >
> > Hi Guys,
> > Wonder if  anybody could shed some light on how to reduce the load on
> HBase cluster when running a full scan.
> > The need is to dump everything I have in HBase and into a Hive table. The
> HBase data size is around 500g.
> > The job creates 9000 mappers, after about 1000 maps things go south every
> time..
> > If I run below insert it runs for about 30 minutes then starts bringing
> down HBase cluster after which region servers need to be restarted..
> > Wonder if there is a way to throttle it somehow or otherwise if there is
> any other method of getting structured data out?
> > Any help is appreciated,
> > Thanks,
> > -Vitaly
> >
> > create external table hbase_linked_table (
> > mykey        string,
> > info        map<string, string>,
> > )
> > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > WITH
> > SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
> > TBLPROPERTIES ("hbase.table.name" = "hbase_table2");
> >
> > set hive.exec.compress.output=true;
> > set io.seqfile.compression.type=BLOCK;
> > set mapred.output.compression.type=BLOCK;
> > set
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
> >
> > set mapred.reduce.tasks=40;
> > set mapred.map.tasks=25;
> >
> > INSERT overwrite table tmp_hive_destination
> > select * from hbase_linked_table;
> >
>
>

Reply via email to