>From the discussion in HBASE--3551, you can compute the numbers you
need. This comment is important:

https://issues.apache.org/jira/browse/HBASE-3551?focusedCommentId=13005272&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13005272

You can use the HFile tool too on your own files to see the current situation.

Changing the block size is easy and should fix your problem, the only
thing to keep in mind is that if you are doing a lot of random reads
you're probably going to trash your block cache a lot, along with
having to fetch more data than you need. But with that number of rows
that you have, something tells me that it's not your case.

J-D

On Thu, Sep 29, 2011 at 10:25 PM, Steinmaurer Thomas
<[email protected]> wrote:
> Hello,
>
>
>
> In a prototypical cluster we have 8 region servers with 4G HBase heap
> space. Each region server has about 107 regions, with a region size of
> 1G using Snappy as compression codec. The table has ~ 1.8 billion rows
> with a 48 characters row-key and measurement values as cell values, so
> values are rather small. Currently, the storefileIndexsize for each
> region server is ~ 1300M. We are afraid, that with an increasing number
> of rows, we need quite an amount of RAM per RS for just holding the
> index. Is this somehow linear, e.g. if we double the number of rows to
> ~3.6 billion, we will have around 2600M storefileIndexsize?
>
>
>
> I found a few references discussing storefileIndexsize:
>
> http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&su
> bj=a+question+storefileIndexSize
> <http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&s
> ubj=a+question+storefileIndexSize>
>
> http://hbase.apache.org/book.html#keysize
> <http://hbase.apache.org/book.html#keysize>
>
>
>
> The basic suggestion is to increase the block size (we currently use the
> default 64K) and to reduce the length of the row-key, column family and
> qualifier names. Are there more?
>
>
>
> True, in our prototypical implementation we have used rather "good
> readable" names for column families and qualifiers. Does anybody have
> numbers from practice on storefileIndexsize decreased with shorter
> column family and qualifier names?
>
>
>
> Thanks,
>
> Thomas
>
>
>
>

Reply via email to