Hi
 
I was wondering if I could seek some advance about data management in HBase?  I 
plan to use HBase to store data that has a  variable length lifespan, the vast 
majority will be short but occasionally the data life time will be 
significantly longer (3 days versus 3 months).  Once the lifespan is over I 
need the data to be deleted at some point in the near future (within a few day 
is fine).  I don’t think I can use standard TTL for this because that’s fixed 
at a column family level.  Therefore, my plan was to run script every few days 
that looks through external information for what needs to be kept and then 
updates HBase in some way so that it can understand.  With the data in HBase I 
can then use the standard TTL mechanism to clean up.
 
The two ways I can think of to let HBase know are:
 
Add a co-processor that updates timestamp on each read and then have my process 
simply read the data.  I shied away from this because the documentation 
indicated the co-processor can’t take row locks.  Does that imply that it 
shouldn’t modify the underlying data.  For my use case the timestamp doesn’t 
have to be perfect the keys are created in a such that the underlying data is 
fixed at creation time.
Add an extra column to each row that’s a cache flag and rewrite that at various 
intervals so that the timestamp updates and prevents the TTL from deleting it.
 
Are there other best practice alternatives?
 
Thanks
 
Richard

Reply via email to