RE: storefileIndexsize

Steinmaurer Thomas Mon, 03 Oct 2011 02:14:56 -0700

Hi!

Thanks for your comments and the link. We will have a mix of bulk
processing via Map/Reduce and random reads through the RowKey via a
Thrift/Java API client.


Thanks,
Thomas


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Jean-Daniel Cryans
Sent: Freitag, 30. September 2011 19:45
To: [email protected]
Subject: Re: storefileIndexsize

>From the discussion in HBASE--3551, you can compute the numbers you
need. This comment is important:

https://issues.apache.org/jira/browse/HBASE-3551?focusedCommentId=130052
72&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
#comment-13005272

You can use the HFile tool too on your own files to see the current
situation.

Changing the block size is easy and should fix your problem, the only
thing to keep in mind is that if you are doing a lot of random reads
you're probably going to trash your block cache a lot, along with having
to fetch more data than you need. But with that number of rows that you
have, something tells me that it's not your case.

J-D

On Thu, Sep 29, 2011 at 10:25 PM, Steinmaurer Thomas
<[email protected]> wrote:
> Hello,
>
>
>
> In a prototypical cluster we have 8 region servers with 4G HBase heap 
> space. Each region server has about 107 regions, with a region size of

> 1G using Snappy as compression codec. The table has ~ 1.8 billion rows

> with a 48 characters row-key and measurement values as cell values, so

> values are rather small. Currently, the storefileIndexsize for each 
> region server is ~ 1300M. We are afraid, that with an increasing 
> number of rows, we need quite an amount of RAM per RS for just holding

> the index. Is this somehow linear, e.g. if we double the number of 
> rows to
> ~3.6 billion, we will have around 2600M storefileIndexsize?
>
>
>
> I found a few references discussing storefileIndexsize:
>
> http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&;
> su
> bj=a+question+storefileIndexSize
> <http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize
> &s
> ubj=a+question+storefileIndexSize>
>
> http://hbase.apache.org/book.html#keysize
> <http://hbase.apache.org/book.html#keysize>
>
>
>
> The basic suggestion is to increase the block size (we currently use 
> the default 64K) and to reduce the length of the row-key, column 
> family and qualifier names. Are there more?
>
>
>
> True, in our prototypical implementation we have used rather "good 
> readable" names for column families and qualifiers. Does anybody have 
> numbers from practice on storefileIndexsize decreased with shorter 
> column family and qualifier names?
>
>
>
> Thanks,
>
> Thomas
>
>
>
>

RE: storefileIndexsize

Reply via email to