Hi Anoop, I agree - I am not so concerned about the savings on disk - rather I am thinking about the savings inside the block cache. I am not sure how stable PrefixDeltaEncoding is and who else uses it. If not, are there people using FastDiff encoding - it seems like any form of encoding scheme would get us huge wins.
Thanks ! Varun On Mon, Dec 3, 2012 at 8:23 PM, Anoop Sam John <[email protected]> wrote: > Hi Varun > It looks to be very clear that you need to use some sort > of encoding scheme. PrefixDeltaEncoding would be fine may be.. You can > see the other algos also like the FastDiff... and see how much space it > can save in your case. Also suggest you can use the encoding for data on > disk as well as in memory (block cache) > >The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) = > 28 bytes > In every KV that is getting stored the key size would be > 4(key length) + 4(value length) + 2(rowkey length) + 8(rowkey) + 1[cf > length] + 12(cf + qualifer) + 8(timestamp) + 1( type PUT/DELETE...) + > value (0 bytes???? atleast 1 byte right) = 39+ bytes... > > Just making it clear for you :) > > -Anoop- > ________________________________________ > From: Varun Sharma [[email protected]] > Sent: Tuesday, December 04, 2012 2:36 AM > To: Marcos Ortiz > Cc: [email protected] > Subject: Re: Long row + column keys > > Hi Marcos, > > Thanks for the links. We have gone through these and thought about the > schema. My question is about whether using PrefixDeltaEncoding makes sense > in our situation... > > Varun > > On Mon, Dec 3, 2012 at 12:36 PM, Marcos Ortiz <[email protected]> wrote: > > > Regards, Varun. > > I think that you can see the Bernoit Sigoure (@tsuna)ć„€ talk called > > "Lessons learned from OpenTSDB" in the last > > HBaseCon . [1] > > He explained in great detail how to design your schema to obtain the best > > performance from HBase. > > > > Other recommended talks are: "HBase Internals" from Lars, and "HBase > > Schema Design" from Ian > > [2][3] > > > > [1] http://www.slideshare.net/**cloudera/4-opentsdb-hbasecon< > http://www.slideshare.net/cloudera/4-opentsdb-hbasecon> > > [2] http://www.slideshare.net/**cloudera/3-learning-h-base-** > > internals-lars-hofhansl-**salesforce-final/< > http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final/ > > > > [3] http://www.slideshare.net/**cloudera/5-h-base-**schemahbasecon2012< > http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012> > > > > > > On 12/03/2012 02:58 PM, Varun Sharma wrote: > > > >> Hi, > >> > >> I have a schema where the rows are 8 bytes long and the columns are 12 > >> bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this > >> going to be space inefficient in terms of HFile size (large index + > >> blocks) > >> ? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) > = > >> 28 bytes. I am using hbase 0.94.0 which has HFile v2. > >> > > Yes, like you said, HFile v2 is included in 0.94, but although is in > trunk > > right now, your should > > keep following the development of HBase, focused on HBASE-5313 and > > HBASE-5521, because > > the development team is working in a new file storage format called HFile > > v3, based on a columnar > > format called Trevni for Avro by Dug Cutting.[4][5][6][7] > > > > > > [4] https://issues.apache.org/**jira/browse/HBASE-5313< > https://issues.apache.org/jira/browse/HBASE-5313> > > [5] https://issues.apache.org/**jira/browse/HBASE-5521< > https://issues.apache.org/jira/browse/HBASE-5521> > > [6] https://github.com/cutting/**trevni< > https://github.com/cutting/trevni> > > [7] https://issues.apache.org/**jira/browse/AVRO-806< > https://issues.apache.org/jira/browse/AVRO-806> > > > > > > > > > >> Also, should I be using an encoding technique to get the number of bytes > >> down (like PrefixDeltaEncoding) which is provided by hbase ? > >> > > Read the Clouderać„€ blog post called "HBase I/O - HFile" to see how Prefix > > and Diff encodings > > works, and decide which is the more suitable for you.[8] > > > > > > [8] > http://blog.cloudera.com/blog/**2012/06/hbase-io-hfile-input-**output/< > http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/> > > > > I hope that all this information could help you. > > Best wishes > > > >> > >> > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > > INFORMATICAS... > > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > > > http://www.uci.cu > > http://www.facebook.com/**universidad.uci< > http://www.facebook.com/universidad.uci> > > http://www.flickr.com/photos/**universidad_uci< > http://www.flickr.com/photos/universidad_uci> > >
