I ran the following shell command to create the table:
hbase(main):001:0> create 't1', {NAME => 'cf', KEEP_DELETED_CELLS => true}The second get command returns the same result as the first. Lars: The refguide doesn't cover such usage. Do you think we should document it ? Cheers On Mon, Dec 9, 2013 at 2:53 PM, lars hofhansl <[email protected]> wrote: > This is because by default a delete marker extends all the way back time. > When you set KEEP_DELETED_CELLS for your column family this behavior is > fixed. I.e. you get correct timerange query behavior even w.r.t. to deletes. > > > -- Lars > > > > ________________________________ > From: Niels Basjes <[email protected]> > To: user <[email protected]> > Sent: Monday, December 9, 2013 12:47 AM > Subject: Why does a delete behave like this? > > > Hi, > > When I first started learning about HBase I compared the logic of setting > new values to something that is similar to the way a tool like Subversion > works: When you set a new value you don't overwrite the old one, you simply > create a new version. > Just like subversion you can then at a later moment retrieve the old value > that way the situation at an earlier date. > > (The only real variation to the SVN model is that HBase only retains the > last N versions of a cell.) > > There is however one situation where this comparison really fails: When you > do a delete on a cell. > If you want to retrieve the state of a thing from subversion and in the > current version this thing has been deleted then you can still get it back. > With HBase however if you delete a cell you place a tombstone at a specific > time and as such internally the older values are still present. > > But when you try to retrieve such an older value then you still get an > empty result back (i.e. no such cell). > The direct consequence of the currently implemented model is that an > application can never retrieve the correct state of a row at an older > timestamp if a delete on any cell has occurred. > > Example: > > I create a table with one row: > > > create 't1', 'cf' > > put 't1', 'rowid', 'cf:1', 'One', 1000 > > put 't1', 'rowid', 'cf:2', 'Two', 2000 > > put 't1', 'rowid', 'cf:3', 'Three', 3000 > > get 't1', 'rowid' , {TIMERANGE => [0,3500]} > > COLUMN CELL > cf:1 timestamp=1000, value=One > cf:2 timestamp=2000, value=Two > cf:3 timestamp=3000, value=Three > 3 row(s) in 0.0150 seconds > > Then the delete of a cell at a later timestamp: > > > delete 't1', 'rowid', 'cf:1', 4000 > > Now if I retrieve the row at time 3500 I would find it logical that I would > still see the same values as I would above. > This is however the reality: > > > get 't1', 'rowid' , {TIMERANGE => [0,3500]} > > COLUMN CELL > cf:2 timestamp=2000, value=Two > cf:3 timestamp=3000, value=Three > 2 row(s) in 0.0120 seconds > > > Why has it been designed/implemented like this? > What is the logic behind this model? > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
