Hi!

See inline ...

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Jean-Daniel Cryans
Sent: Donnerstag, 06. Oktober 2011 20:17
To: [email protected]
Subject: Re: Question on timestamp, timeranges

We had a discussion about timestamps recently, and like I was saying
there the general rule is not to try using the timestamps.

About the other questions:

On Thu, Oct 6, 2011 at 6:03 AM, Steinmaurer Thomas <
[email protected]> wrote:

> Hello,
>
>
>
> we think about using the internal timestamp for processing a range of 
> newly inserted records in a HBase table. Is the timestamp a reliable 
> way to go?


> Hard to tell without really knowing what you're trying to do, but my
default answer is no. If the timestamp is part of your data model, it
should be inside your row key or a column.

It's part of our rowkey but due to scalability it's the last part of a
three-part rowkey. e.g.:

part1-part2-YYYYMMDDhhmmss

This is perfect for our ad-hoc queries for part1/part2 for a given day
via a web-front end.

But, that we are also trying to do is to process rows either via a
client or a M/R-job which have been inserted e.g. yesterday for
calculating daily aggregated values. As our timestamp is at the end of
the rowkey, we thought about setting the timerange of a scanner object
as filter criteria when starting the MR-Job. While not perfect it's
better than doing a full scan of the table I guess.


> When is a timestamp updated? For sure when inserting HBase rows, but I

> wonder what happens when overwriting existing rows with unchanged cell

> values?
>

> Every time a new cell is inserted, even if it has the same data
(because if HBase had to check that, it would have to do a read for
every write, right?).

Ok, Thanks.


Regards,
Thomas

Reply via email to