Hi, 

I'm completely new to HBase and have some questions regarding cell timestamps. 

My questions: Are there practical limitations to the number of versions 
(timestamps) a cell can have? Can a cell have, say, a million versions? What 
are the consequences of this many versions to performance and system 
requirements? Or instead, should composite row keys be used instead as sorted 
indexes when numbers are this high?

To illustrate my questions, I'm modelling the messages exchanged between any 2 
users on our system. The table is called "messages", the row key is a composite 
of the two users' ids involved in the message exchange (e.g. "user1:user2"). A 
column (e.g. "exchanges:message") contains a cell that is regularly updated 
with the last message between those users. The cell's timestamp is then used in 
conjunction with Get.setMaxVersions() and Get.setTimeRange() to enable queries 
such as "Get the messages exchanged between user1 and user2 since 12th October 
12:02:02" or "Get the last 25 messages exchanged  between user1 and user2" or 
"Get all messages exchanged  between user1 and user2".

messages :  {
        …
        user1:user2 :  {
                exchanges:message : {
                        ...
                        t3: "Not bad",
                        t2: "How's it going?",
                        t1: "Hello"
                }
        },
        …
} 

Over time, the number of messages exchanged between the 2 users will be 
substantial - and growing. I'm concerned that cell versioning was NOT intended 
for this purpose, and there might be a consequences for having, say a million 
versions of a cell, 

Thanks in advance.

Mark

Reply via email to