Sean, You wrote the following: "> But sometimes, we do need to save multiple versions of values, such as > logging events, or messages of Facebook. In these cases, what is the trade > off between saving them in different rows, and in different versions of one > row? > " You're not updating logging events, so why would you consider versioning since each log event is unique. You'd store them as separate rows. Think of versioning as allowing one to roll back an update in a transactional system. (Note: HBase doesn't have transactions or 'updates'. I'm just trying to translate the concept.)
HTH -Mike > Date: Fri, 26 Aug 2011 16:37:46 +0800 > Subject: Re: Versioning > From: [email protected] > To: [email protected] > > Hi, I just saw your recent update of the hbase book on the version number > question, and I'm also confused about it. > As said on the book (HBASE-4251), it is not recommended setting the number > of versions to an exceedingly high level (e.g., hundreds or more) unless > those old values are very dear to you because this will greatly increase > StoreFile size. > > But sometimes, we do need to save multiple versions of values, such as > logging events, or messages of Facebook. In these cases, what is the trade > off between saving them in different rows, and in different versions of one > row? > > Thank you. > Sean > > > 2011/8/18 Doug Meil <[email protected]> > > > > > Versioning can be used to see the previous state of a record. Some people > > need this feature, others don't. > > > > One thing that may be worth a review is this... > > > > http://hbase.apache.org/book.html#keysize > > > > ... and specifically the fact about all the values being freighted with > > timestamp (aka version) too. I don't know your use case, and I'm not sure > > I have the time to understand it, but 1 million versions seems like a lot. > > You're going to use a lot of space doing that. > > > > > > > > > > On 8/17/11 11:53 AM, "Mark" <[email protected]> wrote: > > > > >I'm trying to fully understand all the possibilities of what HBase has > > >to offer but I can determine a valid use case for multiple versions. Can > > >someone please explain some real life use cases for this? > > > > > >Also, at what point is there "too many versions". For example to store > > >all the queries a user has performed couldn't we create a column family > > >and have max versions set to something really high (1M). Using this > > >method we could then ask for the last X amount of queries by setting the > > >max versions to X. It seems like this can also be accomplished by > > >creating a separate row for each query but I'm not sure why one strategy > > >would be better than the other. > > > > > >Please help me understand. Thanks! > > > >
