https://issues.apache.org/jira/browse/HBASE-9005 :) Just have to do it now.
________________________________ From: Ted Yu <[email protected]> To: "[email protected]" <[email protected]>; lars hofhansl <[email protected]> Sent: Monday, December 9, 2013 8:16 PM Subject: Re: Why does a delete behave like this? I ran the following shell command to create the table: hbase(main):001:0> create 't1', {NAME => 'cf', KEEP_DELETED_CELLS => true} The second get command returns the same result as the first. Lars: The refguide doesn't cover such usage. Do you think we should document it ? Cheers On Mon, Dec 9, 2013 at 2:53 PM, lars hofhansl <[email protected]> wrote: This is because by default a delete marker extends all the way back time. >When you set KEEP_DELETED_CELLS for your column family this behavior is fixed. >I.e. you get correct timerange query behavior even w.r.t. to deletes. > > >-- Lars > > > >________________________________ > From: Niels Basjes <[email protected]> >To: user <[email protected]> >Sent: Monday, December 9, 2013 12:47 AM >Subject: Why does a delete behave like this? > > > >Hi, > >When I first started learning about HBase I compared the logic of setting >new values to something that is similar to the way a tool like Subversion >works: When you set a new value you don't overwrite the old one, you simply >create a new version. >Just like subversion you can then at a later moment retrieve the old value >that way the situation at an earlier date. > >(The only real variation to the SVN model is that HBase only retains the >last N versions of a cell.) > >There is however one situation where this comparison really fails: When you >do a delete on a cell. >If you want to retrieve the state of a thing from subversion and in the >current version this thing has been deleted then you can still get it back. >With HBase however if you delete a cell you place a tombstone at a specific >time and as such internally the older values are still present. > >But when you try to retrieve such an older value then you still get an >empty result back (i.e. no such cell). >The direct consequence of the currently implemented model is that an >application can never retrieve the correct state of a row at an older >timestamp if a delete on any cell has occurred. > >Example: > >I create a table with one row: > >> create 't1', 'cf' >> put 't1', 'rowid', 'cf:1', 'One', 1000 >> put 't1', 'rowid', 'cf:2', 'Two', 2000 >> put 't1', 'rowid', 'cf:3', 'Three', 3000 >> get 't1', 'rowid' , {TIMERANGE => [0,3500]} > > COLUMN CELL > cf:1 timestamp=1000, value=One > cf:2 timestamp=2000, value=Two > cf:3 timestamp=3000, value=Three > 3 row(s) in 0.0150 seconds > >Then the delete of a cell at a later timestamp: > >> delete 't1', 'rowid', 'cf:1', 4000 > >Now if I retrieve the row at time 3500 I would find it logical that I would >still see the same values as I would above. >This is however the reality: > >> get 't1', 'rowid' , {TIMERANGE => [0,3500]} > > COLUMN CELL > cf:2 timestamp=2000, value=Two > cf:3 timestamp=3000, value=Three > 2 row(s) in 0.0120 seconds > > >Why has it been designed/implemented like this? >What is the logic behind this model? > >-- >Best regards / Met vriendelijke groeten, > >Niels Basjes
