Yes, this is what we did now. We maintained a lower bound of timestamp for scan. Once an index build is done, we increase it to a higher value.
On Wed, Jul 17, 2013 at 2:50 AM, Jimmy Xiang <[email protected]> wrote: > When you set up the MR, does it help to set a proper timestamp filter or > time range in the scan object? > > > On Tue, Jul 16, 2013 at 5:59 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Another option might be to setup the proper TTL on the table? You alter > the > > table to set the TTL to reflect your timestamp, the you run a compaction? > > The issue is that you have to disable the table while you alter it. > > > > JM > > > > 2013/7/16 Ted Yu <[email protected]> > > > > > Would this method (of Delete) serve your need ? > > > > > > public Delete deleteFamily(byte [] family, long timestamp) { > > > From its Javadoc: > > > > > > * Delete all columns of the specified family with a timestamp less > > than > > > > > > * or equal to the specified timestamp. > > > > > > On Mon, Jul 15, 2013 at 8:07 PM, Chao Shi <[email protected]> wrote: > > > > > > > Jean-Marc Spaggiari <jean-marc@...> writes: > > > > > > > > > > > > > > When you send a delete command to the server, you can specify a > > > > timestamp. > > > > > So as the result of your MR job,"just" emit this delete with the > > > specific > > > > > timestamp to remove any previous version? > > > > > > > > > > JM > > > > > > > > > > 2013/7/15 Chao Shi <stepinto@...> > > > > > > > > > > > Hi HBase users, > > > > > > > > > > > > We have created a index table (say T2) of another table (say t1). > > The > > > > > > clients who write to T1 also write a index record to T2 with the > > same > > > > > > timestamp. There may be accumulated inconsistency as time goes > by. > > So > > > > we > > > > > > run a MR job periodically, which fully scans T1, builds a index, > > and > > > > > > bulk-loads the result to T2. > > > > > > > > > > > > Because the MR job may be running for a while, during the period > of > > > > which, > > > > > > all new data into T2 must be kept and not be overridden. So the > MR > > > > creates > > > > > > puts using the timestamp the job starts. > > > > > > > > > > > > Then we want all data in T2 before a given timestamp to invisible > > for > > > > read > > > > > > after the index builds successfully and get deleted eventually > > (e.g. > > > > during > > > > > > major compaction). We prefer setting it explicitly than using the > > TTL > > > > > > feature for safety, as we want only old data are deleted only > when > > > the > > > > new > > > > > > data is written. Does HBase support this kind of operation for > now? > > > > > > > > > > > > Thanks, > > > > > > Chao > > > > > > > > > > > > > > > > > > > Hi Jean-Marc, > > > > > > > > Thanks for the reply. > > > > > > > > I see delete can specify a timestamp, but I don't think that is what > I > > > > need. > > > > To clarify, in my scenario, I don't want to issue deletes for every > key > > > > (because I don't know what exactly to delete unless do another full > > > scan). > > > > > > > > I'd like to see if this is possible: set a min_timestamp to > > > > ColumnDescriptor. Once done, KVs before this timestamp become > invisible > > > to > > > > read. During major compaction, these KVs are deleted. It is the > > absolute > > > > version of TTL. > > > > > > > > > > > > > > > > > > > > > > > > > >
