Jean-Marc Spaggiari <jean-marc@...> writes: > > When you send a delete command to the server, you can specify a timestamp. > So as the result of your MR job,"just" emit this delete with the specific > timestamp to remove any previous version? > > JM > > 2013/7/15 Chao Shi <stepinto@...> > > > Hi HBase users, > > > > We have created a index table (say T2) of another table (say t1). The > > clients who write to T1 also write a index record to T2 with the same > > timestamp. There may be accumulated inconsistency as time goes by. So we > > run a MR job periodically, which fully scans T1, builds a index, and > > bulk-loads the result to T2. > > > > Because the MR job may be running for a while, during the period of which, > > all new data into T2 must be kept and not be overridden. So the MR creates > > puts using the timestamp the job starts. > > > > Then we want all data in T2 before a given timestamp to invisible for read > > after the index builds successfully and get deleted eventually (e.g. during > > major compaction). We prefer setting it explicitly than using the TTL > > feature for safety, as we want only old data are deleted only when the new > > data is written. Does HBase support this kind of operation for now? > > > > Thanks, > > Chao > > >
Hi Jean-Marc, Thanks for the reply. I see delete can specify a timestamp, but I don't think that is what I need. To clarify, in my scenario, I don't want to issue deletes for every key (because I don't know what exactly to delete unless do another full scan). I'd like to see if this is possible: set a min_timestamp to ColumnDescriptor. Once done, KVs before this timestamp become invisible to read. During major compaction, these KVs are deleted. It is the absolute version of TTL.
