This issue is a common pitfall to those new to HBase and I think it could be
a good thing to have in the HBase book. Once someone realizes that you can
store multiple values for the same cell, each with a timestamp there can be
a natural tendency to think "hey, I can store a one-to-many using multiple
version of a cell". That's not the intent of versioned cell values.

Versioned cell values can be thought of as a way to keep a history of change
for a single entity that at any given time only has one value. Like keeping
track of a state change over time. For a one-to-many relationship (i.e., a
user with many events), favor either multiple rows or multiple columns
instead.

Bill


On Fri, Aug 26, 2011 at 9:16 AM, Buttler, David <[email protected]> wrote:

> Physically, you will be storing the same data.  Hbase stores everything as
> key-value pairs.  The cell identifier is "row key, column family, column
> qualifier, timestamp"
>
> However, by storing items in different rows it is more convenient to query
> and delete old values.  By default you only get the most recent version of a
> column during a scan.
>
> One way to think about it is: versions are for when you don't want to
> forget previous versions, but you typically only want the most recent
> version.  If you want to be continuously accessing old versions, you would
> be better off putting them in separate rows.
>
> Dave
>
> -----Original Message-----
> From: Sheng Chen [mailto:[email protected]]
> Sent: Friday, August 26, 2011 1:38 AM
> To: [email protected]
> Subject: Re: Versioning
>
> Hi, I just saw your recent update of the hbase book on the version number
> question, and I'm also confused about it.
> As said on the book (HBASE-4251), it is not recommended setting the number
> of versions to an exceedingly high level (e.g., hundreds or more) unless
> those old values are very dear to you because this will greatly increase
> StoreFile size.
>
> But sometimes, we do need to save multiple versions of values, such as
> logging events, or messages of Facebook. In these cases, what is the trade
> off between saving them in different rows, and in different versions of one
> row?
>
> Thank you.
> Sean
>
>
> 2011/8/18 Doug Meil <[email protected]>
>
> >
> > Versioning can be used to see the previous state of a record.  Some
> people
> > need this feature, others don't.
> >
> > One thing that may be worth a review is this...
> >
> > http://hbase.apache.org/book.html#keysize
> >
> > ... and specifically the fact about all the values being freighted with
> > timestamp (aka version) too.  I don't know your use case, and I'm not
> sure
> > I have the time to understand it, but 1 million versions seems like a
> lot.
> >  You're going to use a lot of space doing that.
> >
> >
> >
> >
> > On 8/17/11 11:53 AM, "Mark" <[email protected]> wrote:
> >
> > >I'm trying to fully understand all the possibilities of what HBase has
> > >to offer but I can determine a valid use case for multiple versions. Can
> > >someone please explain some real life use cases for this?
> > >
> > >Also, at what point is there "too many versions". For example to store
> > >all the queries a user has performed couldn't we create a column family
> > >and have max versions set to something really high (1M). Using this
> > >method we could then ask for the last X amount of queries by setting the
> > >max versions to X. It seems like this can also be accomplished by
> > >creating a separate row for each query but I'm not sure why one strategy
> > >would be better than the other.
> > >
> > >Please help me understand. Thanks!
> >
> >
>

Reply via email to