Hi,
I'm completely new to HBase and have some questions regarding cell timestamps.
My questions: Are there practical limitations to the number of versions
(timestamps) a cell can have? Can a cell have, say, a million versions? What
are the consequences of this many versions to performance and system
requirements? Or instead, should composite row keys be used instead as sorted
indexes when numbers are this high?
To illustrate my questions, I'm modelling the messages exchanged between any 2
users on our system. The table is called "messages", the row key is a composite
of the two users' ids involved in the message exchange (e.g. "user1:user2"). A
column (e.g. "exchanges:message") contains a cell that is regularly updated
with the last message between those users. The cell's timestamp is then used in
conjunction with Get.setMaxVersions() and Get.setTimeRange() to enable queries
such as "Get the messages exchanged between user1 and user2 since 12th October
12:02:02" or "Get the last 25 messages exchanged between user1 and user2" or
"Get all messages exchanged between user1 and user2".
messages : {
…
user1:user2 : {
exchanges:message : {
...
t3: "Not bad",
t2: "How's it going?",
t1: "Hello"
}
},
…
}
Over time, the number of messages exchanged between the 2 users will be
substantial - and growing. I'm concerned that cell versioning was NOT intended
for this purpose, and there might be a consequences for having, say a million
versions of a cell,
Thanks in advance.
Mark