storefileIndexsize

Steinmaurer Thomas Thu, 29 Sep 2011 22:28:17 -0700

Hello,


In a prototypical cluster we have 8 region servers with 4G HBase heap
space. Each region server has about 107 regions, with a region size of
1G using Snappy as compression codec. The table has ~ 1.8 billion rows
with a 48 characters row-key and measurement values as cell values, so
values are rather small. Currently, the storefileIndexsize for each
region server is ~ 1300M. We are afraid, that with an increasing number
of rows, we need quite an amount of RAM per RS for just holding the
index. Is this somehow linear, e.g. if we double the number of rows to
~3.6 billion, we will have around 2600M storefileIndexsize?

 

I found a few references discussing storefileIndexsize:

http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&su
bj=a+question+storefileIndexSize
<http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&s
ubj=a+question+storefileIndexSize> 

http://hbase.apache.org/book.html#keysize
<http://hbase.apache.org/book.html#keysize> 

 

The basic suggestion is to increase the block size (we currently use the
default 64K) and to reduce the length of the row-key, column family and
qualifier names. Are there more?

 

True, in our prototypical implementation we have used rather "good
readable" names for column families and qualifiers. Does anybody have
numbers from practice on storefileIndexsize decreased with shorter
column family and qualifier names?

 

Thanks,

Thomas

storefileIndexsize

Reply via email to