Hi Mark,

First please read this post: http://outerthought.org/blog/417-ot.html

Rest inline below.

On Nov 22, 2010, at 7:45, Mark Jarecki <[email protected]> wrote:

> Hi, 
> 
> I'm completely new to HBase and have some questions regarding cell 
> timestamps. 
> 
> My questions: Are there practical limitations to the number of versions 
> (timestamps) a cell can have? Can a cell have, say, a million versions? What 
> are the consequences of this many versions to performance and system 
> requirements? Or instead, should composite row keys be used instead as sorted 
> indexes when numbers are this high?

You could use Integer.MAX_VALUE versions. So quite a lot :) The issue is that 
the system needs to search for matches so the more you have the more it needs 
to scan for it. It may also blow out the size of the store file since they all 
belong to one row and therefore cannot be split. 

If you expect many versions or large cell sizes you may be better off doing the 
composite keys approach. 
 
> To illustrate my questions, I'm modelling the messages exchanged between any 
> 2 users on our system. The table is called "messages", the row key is a 
> composite of the two users' ids involved in the message exchange (e.g. 
> "user1:user2"). A column (e.g. "exchanges:message") contains a cell that is 
> regularly updated with the last message between those users. The cell's 
> timestamp is then used in conjunction with Get.setMaxVersions() and 
> Get.setTimeRange() to enable queries such as "Get the messages exchanged 
> between user1 and user2 since 12th October 12:02:02" or "Get the last 25 
> messages exchanged  between user1 and user2" or "Get all messages exchanged  
> between user1 and user2".
> 
> messages :  {
>       …
>       user1:user2 :  {
>               exchanges:message : {
>                       ...
>                       t3: "Not bad",
>                       t2: "How's it going?",
>                       t1: "Hello"
>               }
>       },
>       …
> } 
> 
> Over time, the number of messages exchanged between the 2 users will be 
> substantial - and growing. I'm concerned that cell versioning was NOT 
> intended for this purpose, and there might be a consequences for having, say 
> a million versions of a cell,

Yeah, this is not really what you want to solve with versions then. If you were 
to add the timestamp to the user1:user2:ts key then you can use scans to get 
messages between two timestamps etc. just the same. 

> 
> 
> Thanks in advance.
> 
> Mark

Lars

Reply via email to