Re: loading data in HBase table using APIs

Doug Meil Thu, 04 Aug 2011 08:23:28 -0700

David, thanks for the tip on this.  I just checked in a reorg to the
performance chapter and included this tip.


Stack does the website updating so it's not visible yet, but this tip is
in there.

Thanks!




On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote:

>After a quick scan of the performance section, I didn't see what I
>consider to be a huge performance consideration:
>If at all possible, don't do a reduce on your puts.  The shuffle/sort
>part of the map/reduce paradigm is often useless if all you are trying to
>do is insert/update data in HBase.  From the OP's description it sounds
>like he doesn't need to have any kind of reduce phase [and may be a great
>candidate for bulk loading and the pre-creation of regions].  In any
>case, don't reduce if you can avoid it.
>
>Dave
>
>-----Original Message-----
>From: Doug Meil [mailto:[email protected]]
>Sent: Sunday, July 17, 2011 4:40 PM
>To: [email protected]
>Subject: Re: loading data in HBase table using APIs
>
>
>Hi there-
>
>Take a look at this for starters:
>http://hbase.apache.org/book.html#schema
>
>1)  double-check your row-keys (sanity check), that's in the Schema Design
>chapter.
>
>http://hbase.apache.org/book.html#performance
>
>
>2)  if not using bulk-load - re-create regions, do this regardless of
>using MR or non-MR.
>
>3)  if not using MR job and are using multiple threads with the Java API,
>take a look at HTableUtil.  It's on trunk, but that utility can help you.
>
>
>
>
>
>
>On 7/17/11 4:08 PM, "abhay ratnaparkhi" <[email protected]>
>wrote:
>
>>Hello,
>>
>>I am loading lots of data through API in HBase table.
>>I am using HBase Java API to do this.
>>If I convert this code to map-reduce task and use *TableOutputFormat*
>>class
>>then will I get any performance improvement?
>>
>>As I am not getting input data from existing HBase table or HDFS files
>>there
>>will not be any input to map task.
>>The only advantage is multiple map tasks running simultaneously might
>>make
>>processing faster.
>>
>>Thanks!
>>Regars,
>>Abhay
>

Re: loading data in HBase table using APIs

Reply via email to