Good observation Bill... I'll add it.
On 8/26/11 12:27 PM, "Bill Graham" <[email protected]> wrote: >This issue is a common pitfall to those new to HBase and I think it could >be >a good thing to have in the HBase book. Once someone realizes that you can >store multiple values for the same cell, each with a timestamp there can >be >a natural tendency to think "hey, I can store a one-to-many using multiple >version of a cell". That's not the intent of versioned cell values. > >Versioned cell values can be thought of as a way to keep a history of >change >for a single entity that at any given time only has one value. Like >keeping >track of a state change over time. For a one-to-many relationship (i.e., a >user with many events), favor either multiple rows or multiple columns >instead. > >Bill > > >On Fri, Aug 26, 2011 at 9:16 AM, Buttler, David <[email protected]> wrote: > >> Physically, you will be storing the same data. Hbase stores everything >>as >> key-value pairs. The cell identifier is "row key, column family, column >> qualifier, timestamp" >> >> However, by storing items in different rows it is more convenient to >>query >> and delete old values. By default you only get the most recent version >>of a >> column during a scan. >> >> One way to think about it is: versions are for when you don't want to >> forget previous versions, but you typically only want the most recent >> version. If you want to be continuously accessing old versions, you >>would >> be better off putting them in separate rows. >> >> Dave >> >> -----Original Message----- >> From: Sheng Chen [mailto:[email protected]] >> Sent: Friday, August 26, 2011 1:38 AM >> To: [email protected] >> Subject: Re: Versioning >> >> Hi, I just saw your recent update of the hbase book on the version >>number >> question, and I'm also confused about it. >> As said on the book (HBASE-4251), it is not recommended setting the >>number >> of versions to an exceedingly high level (e.g., hundreds or more) unless >> those old values are very dear to you because this will greatly increase >> StoreFile size. >> >> But sometimes, we do need to save multiple versions of values, such as >> logging events, or messages of Facebook. In these cases, what is the >>trade >> off between saving them in different rows, and in different versions of >>one >> row? >> >> Thank you. >> Sean >> >> >> 2011/8/18 Doug Meil <[email protected]> >> >> > >> > Versioning can be used to see the previous state of a record. Some >> people >> > need this feature, others don't. >> > >> > One thing that may be worth a review is this... >> > >> > http://hbase.apache.org/book.html#keysize >> > >> > ... and specifically the fact about all the values being freighted >>with >> > timestamp (aka version) too. I don't know your use case, and I'm not >> sure >> > I have the time to understand it, but 1 million versions seems like a >> lot. >> > You're going to use a lot of space doing that. >> > >> > >> > >> > >> > On 8/17/11 11:53 AM, "Mark" <[email protected]> wrote: >> > >> > >I'm trying to fully understand all the possibilities of what HBase >>has >> > >to offer but I can determine a valid use case for multiple versions. >>Can >> > >someone please explain some real life use cases for this? >> > > >> > >Also, at what point is there "too many versions". For example to >>store >> > >all the queries a user has performed couldn't we create a column >>family >> > >and have max versions set to something really high (1M). Using this >> > >method we could then ask for the last X amount of queries by setting >>the >> > >max versions to X. It seems like this can also be accomplished by >> > >creating a separate row for each query but I'm not sure why one >>strategy >> > >would be better than the other. >> > > >> > >Please help me understand. Thanks! >> > >> > >>
