Actually, another question: are there issues with multiple puts having the same 
timestamp? I.e. I write a value with timestamp = today 12:00. I later change my 
mind and want to rewrite a different value but with the same timestamp. Would 
that present problems?

Thanks!
-Ben


----- Original Message -----
From: Ben West <[email protected]>
To: "[email protected]" <[email protected]>
Cc: 
Sent: Thursday, October 20, 2011 9:13 AM
Subject: Re: Custom timestamps

Thanks Stack. We are indeed using locks outside of HBase, but I hadn't heard 
about the problems with HBase's locks. Good to know.

-Ben


----- Original Message -----
From: Stack <[email protected]>
To: [email protected]; Ben West <[email protected]>
Cc: 
Sent: Wednesday, October 19, 2011 5:24 PM
Subject: Re: Custom timestamps

On Wed, Oct 19, 2011 at 12:18 PM, Ben West <[email protected]> wrote:
> We're storing timestamped data in HBase; from lurking on the mailing list it 
> seems like the recommendation is usually to make the timestamp part of the 
> row key. I'm curious why this is - is scanning over rows more efficient than 
> scanning over timestamps within a cell?
>

I'd be surprised if a noticeable difference.

It depends on how you are to access the data.  In the tsdb case for
instance, it wants to get all metrics within a particular time range.
If the timestamp it used were that of the hbase system, then you'd
have to do a full table scan each time to find metrics that had been
fired during a particular time period -- i.e. you'd check each row and
see if any entries on the row for the time period you are interested
in -- whereas if the timestamp part of the row key, you instead just
have to start scanning at the opening of the time range you are
querying about.


> The book says: "the version timestamp is internally by HBase for things like 
> time-to-live calculations. It's usually best to avoid setting this timestamp 
> yourself. Prefer using a separate timestamp attribute of the row, or have the 
> timestamp a part of the rowkey, or both." I understand that TTL would be 
> ruined (or saved, depending on your goal) by custom timestamps, and I also 
> gather that the way HBase handles concurrency is through MVCC. But we are 
> using application level locks, and HBase's TTL functionality applying is a 
> bonus if anything.
>

The books advice errs on the side of being conservative I'd say.

The MVCC that we do internally does not use the cell timestamp but
instead a different running sequence number that is associated
(internally) with cells (I've not heard of an application atop hbase
using the hbase timestamps to do MVCC at the application level).

The locks you talk of, are these the locks provided in hbase HTable
API?  If so, are you aware they are dangerous (see back in this
mailing list for explaination)?

> So is there any reason why we shouldn't set the timestamps manually?
>

Generally, hbase works fine with user set timestamps; there can be
issues ordering edits if clients have divergent clocks and the version
being set is time-based but I'm probably not telling you something you
don't already know.

St.Ack

Reply via email to