When you send a delete command to the server, you can specify a timestamp. So as the result of your MR job,"just" emit this delete with the specific timestamp to remove any previous version?
JM 2013/7/15 Chao Shi <[email protected]> > Hi HBase users, > > We have created a index table (say T2) of another table (say t1). The > clients who write to T1 also write a index record to T2 with the same > timestamp. There may be accumulated inconsistency as time goes by. So we > run a MR job periodically, which fully scans T1, builds a index, and > bulk-loads the result to T2. > > Because the MR job may be running for a while, during the period of which, > all new data into T2 must be kept and not be overridden. So the MR creates > puts using the timestamp the job starts. > > Then we want all data in T2 before a given timestamp to invisible for read > after the index builds successfully and get deleted eventually (e.g. during > major compaction). We prefer setting it explicitly than using the TTL > feature for safety, as we want only old data are deleted only when the new > data is written. Does HBase support this kind of operation for now? > > Thanks, > Chao >
