Don't forget to look at this section for hbase schema design examples. http://hbase.apache.org/book.html#schema.casestudies
On 9/17/13 1:52 PM, "Adrian CAPDEFIER" <[email protected]> wrote: >Thanks for the tip. In the data warehousing world I used to call them >surrogate keys - I wonder if there's any difference between the two. > > >On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov ><[email protected]>wrote: > >> > Is there a built-in functionality to generate (integer) surrogate >>values >> in >> > hbase that can be used on the rowkey or does it need to be hand code >>it >> > from scratch? >> >> There is no such functionality in HBase. What are asking for is known >>as a >> dictionary compression : >> unique 1-1 association between arbitrary strings and numeric values. >> >> Best regards, >> Vladimir Rodionov >> Principal Platform Engineer >> Carrier IQ, www.carrieriq.com >> e-mail: [email protected] >> >> ________________________________________ >> From: Ted Yu [[email protected]] >> Sent: Tuesday, September 17, 2013 9:53 AM >> To: [email protected] >> Subject: Re: hbase schema design >> >> I guess you were referring to section 6.3.2 >> >> bq. rowkey is stored and/ or read for every cell value >> >> The above is true. >> >> bq. the event description is a string of 0.1 to 2Kb >> >> You can enable Data Block encoding to reduce storage. >> >> Cheers >> >> >> >> On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER >><[email protected] >> >wrote: >> >> > Howdy all, >> > >> > I'm trying to use hbase for the first time (plenty of other experience >> with >> > RDBMS database though), and I have a couple of questions after reading >> The >> > Book. >> > >> > I am a bit confused by the advice to reduce "the row size" in the >>hbase >> > book. It states that every cell value is accomplished by the >>coordinates >> > (row, column and timestamp). I'm just trying to be thorough, so am I >>to >> > understand that the rowkey is stored and/ or read for every cell value >> in a >> > record or just once per column family in a record? >> > >> > I am intrigued by the rows as columns design as described in the book >>at >> > http://hbase.apache.org/book.html#rowkey.design. To make a long story >> > short, I will end up with a table to store event types and number of >> > occurrences in each day. I would prefer to have the event description >>as >> > the row key and the dates when it happened as columns - up to 7300 for >> > roughly 20 years. >> > However, the event description is a string of 0.1 to 2Kb and if it is >> > stored for each cell value, I will need to use a surrogate (shorter) >> value. >> > >> > Is there a built-in functionality to generate (integer) surrogate >>values >> in >> > hbase that can be used on the rowkey or does it need to be hand code >>it >> > from scratch? >> > >> >> Confidentiality Notice: The information contained in this message, >> including any attachments hereto, may be confidential and is intended >>to be >> read only by the individual or entity to whom this message is >>addressed. If >> the reader of this message is not the intended recipient or an agent or >> designee of the intended recipient, please note that any review, use, >> disclosure or distribution of this message or its attachments, in any >>form, >> is strictly prohibited. If you have received this message in error, >>please >> immediately notify the sender and/or [email protected] and >> delete or destroy any copy of this message and its attachments. >>
