I also have a short blog post about this here: http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html
________________________________ From: Harsh J <[email protected]> To: [email protected] Sent: Wednesday, August 15, 2012 5:50 AM Subject: Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails Yonghu, You are correct at that. Until a major_compact finishes, inserting with old timestamps will never show. Inserted old timestamped values before a major compact but after a delete will all go away. That is why I had to put in the data into the table _after_ the major_compact ran, in that shell output I'd sent. On Wed, Aug 15, 2012 at 5:18 PM, yonghu <[email protected]> wrote: > Hi Harsh, > > I have a question of your description. The deleted tag masks the new > inserted value with old timestamp, that's why the new inserted data > can'be seen. But after major compaction, this new value will be seen > again. So, the question is that how the deletion really executes. In > my understanding, the deletion will delete all the data values which > TSs are less equal than the TS of the deleted tag. So, if you insert a > value with old TS after you insert a deleted tag, it should also be > deleted at the compaction time. For example, if I first insert > (k1,t1), and then delete (k1,t1) with deleted tag which TS is greater > than t1, then reinsert (k1,t1) again. So, at the compaction time, two > (k1,t1) should be deleted. > > wish your response! > > Yong > > > > On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <[email protected]> wrote: >> Dear Harsh, >> >> Thank you very much for your detailed explanation. I could understand >> what had been going on during my put/scan/delete operations. I'll modify >> my application and test programs taking the timestamp implementation >> into consideration. >> >> Best Regards, >> Takahiko Kawasaki >> >> 2012/8/15 Harsh J <[email protected]> >> >>> When a Delete occurs, an insert is made with the timestamp being the >>> current time (to indicate it is the latest version). Hence, when you >>> insert a value after this with an _older_ timestamp, it is not taken >>> in as the latest version, and is hence ignored when scanning. This is >>> why you do not see the data. >>> >>> If you instead insert this after a compaction has fully run on this >>> store file, then your value will indeed get shown after insert, cause >>> at that moment there wouldn't exist such a row with a latest timestamp >>> at all. >>> >>> hbase(main):060:0> flush 'test-table' >>> 0 row(s) in 0.1020 seconds >>> >>> hbase(main):061:0> major_compact 'test-table' >>> 0 row(s) in 0.0400 seconds >>> >>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10 >>> 0 row(s) in 0.0230 seconds >>> >>> hbase(main):063:0> scan 'test-table' >>> ROW COLUMN+CELL >>> row4 column=test-family:, timestamp=10, value=value >>> 1 row(s) in 0.0060 seconds >>> >>> I suppose this is why it is recommended not to mess with the >>> timestamps manually, and instead just rely on versions. >>> >>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <[email protected]> >>> wrote: >>> > Hello, >>> > >>> > I have a problem where 'put' with timestamp does not succeed. >>> > I did the following at the HBase shell. >>> > >>> > (1) Do 'put' with timestamp. >>> > # 'scan' shows 1 row. >>> > >>> > (2) Delete the row by 'deleteall'. >>> > # 'scan' says "0 row(s)". >>> > >>> > (3) Do 'put' again by the same command line as (1). >>> > # 'scan' says "0 row(s)" ! Why? >>> > >>> > (4) Increment the timestamp value by 1 and try 'put' again. >>> > # 'scan' still says "0 row(s)"! Why? >>> > >>> > The command lines I actually typed are as follows and the attached >>> > file is the output from the command lines. >>> > >>> > scan 'test-table' >>> > put 'test-table', 'row3', 'test-family', 'value' >>> > scan 'test-table' >>> > deleteall 'test-table', 'row3' >>> > scan 'test-table' >>> > put 'test-table', 'row3', 'test-family', 'value' >>> > scan 'test-table' >>> > deleteall 'test-table', 'row3' >>> > scan 'test-table' >>> > put 'test-table', 'row4', 'test-family', 'value', 10 >>> > scan 'test-table' >>> > deleteall 'test-table', 'row4' >>> > scan 'test-table' >>> > put 'test-table', 'row4', 'test-family', 'value', 10 >>> > scan 'test-table' >>> > put 'test-table', 'row4', 'test-family', 'value', 10 >>> > scan 'test-table' >>> > quit >>> > >>> > Is this behavior the HBase specification? >>> > >>> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0. >>> > >>> > Could anyone give me any insight, please? >>> > >>> > Best Regards, >>> > Takahiko Kawasaki >>> >>> >>> >>> -- >>> Harsh J >>> -- Harsh J
