Hi! Thanks for your comments and the link. We will have a mix of bulk processing via Map/Reduce and random reads through the RowKey via a Thrift/Java API client.
Thanks, Thomas -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel Cryans Sent: Freitag, 30. September 2011 19:45 To: [email protected] Subject: Re: storefileIndexsize >From the discussion in HBASE--3551, you can compute the numbers you need. This comment is important: https://issues.apache.org/jira/browse/HBASE-3551?focusedCommentId=130052 72&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel #comment-13005272 You can use the HFile tool too on your own files to see the current situation. Changing the block size is easy and should fix your problem, the only thing to keep in mind is that if you are doing a lot of random reads you're probably going to trash your block cache a lot, along with having to fetch more data than you need. But with that number of rows that you have, something tells me that it's not your case. J-D On Thu, Sep 29, 2011 at 10:25 PM, Steinmaurer Thomas <[email protected]> wrote: > Hello, > > > > In a prototypical cluster we have 8 region servers with 4G HBase heap > space. Each region server has about 107 regions, with a region size of > 1G using Snappy as compression codec. The table has ~ 1.8 billion rows > with a 48 characters row-key and measurement values as cell values, so > values are rather small. Currently, the storefileIndexsize for each > region server is ~ 1300M. We are afraid, that with an increasing > number of rows, we need quite an amount of RAM per RS for just holding > the index. Is this somehow linear, e.g. if we double the number of > rows to > ~3.6 billion, we will have around 2600M storefileIndexsize? > > > > I found a few references discussing storefileIndexsize: > > http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize& > su > bj=a+question+storefileIndexSize > <http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize > &s > ubj=a+question+storefileIndexSize> > > http://hbase.apache.org/book.html#keysize > <http://hbase.apache.org/book.html#keysize> > > > > The basic suggestion is to increase the block size (we currently use > the default 64K) and to reduce the length of the row-key, column > family and qualifier names. Are there more? > > > > True, in our prototypical implementation we have used rather "good > readable" names for column families and qualifiers. Does anybody have > numbers from practice on storefileIndexsize decreased with shorter > column family and qualifier names? > > > > Thanks, > > Thomas > > > >
