Another option might be to setup the proper TTL on the table? You alter the
table to set the TTL to reflect your timestamp, the you run a compaction?
The issue is that you have to disable the table while you alter it.

JM

2013/7/16 Ted Yu <[email protected]>

> Would this method (of Delete) serve your need ?
>
>   public Delete deleteFamily(byte [] family, long timestamp) {
> From its Javadoc:
>
>    * Delete all columns of the specified family with a timestamp less than
>
>    * or equal to the specified timestamp.
>
> On Mon, Jul 15, 2013 at 8:07 PM, Chao Shi <[email protected]> wrote:
>
> > Jean-Marc Spaggiari <jean-marc@...> writes:
> >
> > >
> > > When you send a delete command to the server, you can specify a
> > timestamp.
> > > So as the result of your MR job,"just" emit this delete with the
> specific
> > > timestamp to remove any previous version?
> > >
> > > JM
> > >
> > > 2013/7/15 Chao Shi <stepinto@...>
> > >
> > > > Hi HBase users,
> > > >
> > > > We have created a index table (say T2) of another table (say t1). The
> > > > clients who write to T1 also write a index record to T2 with the same
> > > > timestamp. There may be accumulated inconsistency as time goes by. So
> > we
> > > > run a MR job periodically, which fully scans T1, builds a index, and
> > > > bulk-loads the result to T2.
> > > >
> > > > Because the MR job may be running for a while, during the period of
> > which,
> > > > all new data into T2 must be kept and not be overridden. So the MR
> > creates
> > > > puts using the timestamp the job starts.
> > > >
> > > > Then we want all data in T2 before a given timestamp to invisible for
> > read
> > > > after the index builds successfully and get deleted eventually (e.g.
> > during
> > > > major compaction). We prefer setting it explicitly than using the TTL
> > > > feature for safety, as we want only old data are deleted only when
> the
> > new
> > > > data is written. Does HBase support this kind of operation for now?
> > > >
> > > > Thanks,
> > > > Chao
> > > >
> > >
> >
> > Hi Jean-Marc,
> >
> > Thanks for the reply.
> >
> > I see delete can specify a timestamp, but I don't think that is what I
> > need.
> > To clarify, in my scenario, I don't want to issue deletes for every key
> > (because I don't know what exactly to delete unless do another full
> scan).
> >
> > I'd like to see if this is possible: set a min_timestamp to
> > ColumnDescriptor. Once done, KVs before this timestamp become invisible
> to
> > read. During major compaction, these KVs are deleted. It is the absolute
> > version of TTL.
> >
> >
> >
> >
> >
>

Reply via email to