Different from the RDBMS, the data in HBase is stored as key-value pair in HDFS. Hence, for every data version in a cell, the row key will appear.
On Tue, Sep 17, 2013 at 7:53 PM, Ted Yu <[email protected]> wrote: > w.r.t. Data Block Encoding, you can find some performance numbers here: > > > https://issues.apache.org/jira/browse/HBASE-4218?focusedCommentId=13123337&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13123337 > > > On Tue, Sep 17, 2013 at 10:49 AM, Adrian CAPDEFIER > <[email protected]>wrote: > > > Thank you for confirming the rowkey is written for every cell value (I > was > > referring to 6.3.2 indeed). I have looked into data block encoding, but > I'm > > not sure that would help me (more so if I need to link this table to a > > separate table later on). > > > > I will look into the surrogate value option. > > > > > > > > > > On Tue, Sep 17, 2013 at 5:53 PM, Ted Yu <[email protected]> wrote: > > > > > I guess you were referring to section 6.3.2 > > > > > > bq. rowkey is stored and/ or read for every cell value > > > > > > The above is true. > > > > > > bq. the event description is a string of 0.1 to 2Kb > > > > > > You can enable Data Block encoding to reduce storage. > > > > > > Cheers > > > > > > > > > > > > On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER < > > [email protected] > > > >wrote: > > > > > > > Howdy all, > > > > > > > > I'm trying to use hbase for the first time (plenty of other > experience > > > with > > > > RDBMS database though), and I have a couple of questions after > reading > > > The > > > > Book. > > > > > > > > I am a bit confused by the advice to reduce "the row size" in the > hbase > > > > book. It states that every cell value is accomplished by the > > coordinates > > > > (row, column and timestamp). I'm just trying to be thorough, so am I > to > > > > understand that the rowkey is stored and/ or read for every cell > value > > > in a > > > > record or just once per column family in a record? > > > > > > > > I am intrigued by the rows as columns design as described in the book > > at > > > > http://hbase.apache.org/book.html#rowkey.design. To make a long > story > > > > short, I will end up with a table to store event types and number of > > > > occurrences in each day. I would prefer to have the event description > > as > > > > the row key and the dates when it happened as columns - up to 7300 > for > > > > roughly 20 years. > > > > However, the event description is a string of 0.1 to 2Kb and if it is > > > > stored for each cell value, I will need to use a surrogate (shorter) > > > value. > > > > > > > > Is there a built-in functionality to generate (integer) surrogate > > values > > > in > > > > hbase that can be used on the rowkey or does it need to be hand code > it > > > > from scratch? > > > > > > > > > >
