Hi, I just saw your recent update of the hbase book on the version number
question, and I'm also confused about it.
As said on the book (HBASE-4251), it is not recommended setting the number
of versions to an exceedingly high level (e.g., hundreds or more) unless
those old values are very dear to you because this will greatly increase
StoreFile size.

But sometimes, we do need to save multiple versions of values, such as
logging events, or messages of Facebook. In these cases, what is the trade
off between saving them in different rows, and in different versions of one
row?

Thank you.
Sean


2011/8/18 Doug Meil <[email protected]>

>
> Versioning can be used to see the previous state of a record.  Some people
> need this feature, others don't.
>
> One thing that may be worth a review is this...
>
> http://hbase.apache.org/book.html#keysize
>
> ... and specifically the fact about all the values being freighted with
> timestamp (aka version) too.  I don't know your use case, and I'm not sure
> I have the time to understand it, but 1 million versions seems like a lot.
>  You're going to use a lot of space doing that.
>
>
>
>
> On 8/17/11 11:53 AM, "Mark" <[email protected]> wrote:
>
> >I'm trying to fully understand all the possibilities of what HBase has
> >to offer but I can determine a valid use case for multiple versions. Can
> >someone please explain some real life use cases for this?
> >
> >Also, at what point is there "too many versions". For example to store
> >all the queries a user has performed couldn't we create a column family
> >and have max versions set to something really high (1M). Using this
> >method we could then ask for the last X amount of queries by setting the
> >max versions to X. It seems like this can also be accomplished by
> >creating a separate row for each query but I'm not sure why one strategy
> >would be better than the other.
> >
> >Please help me understand. Thanks!
>
>

Reply via email to