Re: Why does a delete behave like this?

lars hofhansl Mon, 09 Dec 2013 20:51:30 -0800

https://issues.apache.org/jira/browse/HBASE-9005  :)
Just have to do it now.




________________________________
 From: Ted Yu <[email protected]>
To: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]> 
Sent: Monday, December 9, 2013 8:16 PM
Subject: Re: Why does a delete behave like this?
 


I ran the following shell command to create the table:

hbase(main):001:0> create 't1', {NAME => 'cf', KEEP_DELETED_CELLS => true}


The second get command returns the same result as the first.

Lars:
The refguide doesn't cover such usage. Do you think we should document it ?

Cheers



On Mon, Dec 9, 2013 at 2:53 PM, lars hofhansl <[email protected]> wrote:

This is because by default a delete marker extends all the way back time.
>When you set KEEP_DELETED_CELLS for your column family this behavior is fixed. 
>I.e. you get correct timerange query behavior even w.r.t. to deletes.
>
>
>-- Lars
>
>
>
>________________________________
> From: Niels Basjes <[email protected]>
>To: user <[email protected]>
>Sent: Monday, December 9, 2013 12:47 AM
>Subject: Why does a delete behave like this?
>
>
>
>Hi,
>
>When I first started learning about HBase I compared the logic of setting
>new values to something that is similar to the way a tool like Subversion
>works: When you set a new value you don't overwrite the old one, you simply
>create a new version.
>Just like subversion you can then at a later moment retrieve the old value
>that way the situation at an earlier date.
>
>(The only real variation to the SVN model is that HBase only retains the
>last N versions of a cell.)
>
>There is however one situation where this comparison really fails: When you
>do a delete on a cell.
>If you want to retrieve the state of a thing from subversion and in the
>current version this thing has been deleted then you can still get it back.
>With HBase however if you delete a cell you place a tombstone at a specific
>time and as such internally the older values are still present.
>
>But when you try to retrieve such an older value then you still get an
>empty result back (i.e. no such cell).
>The direct consequence of the currently implemented model is that an
>application can never retrieve the correct state of a row at an older
>timestamp if a delete on any cell has occurred.
>
>Example:
>
>I create a table with one row:
>
>> create 't1', 'cf'
>> put 't1', 'rowid', 'cf:1', 'One', 1000
>> put 't1', 'rowid', 'cf:2', 'Two', 2000
>> put 't1', 'rowid', 'cf:3', 'Three', 3000
>> get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>    COLUMN                     CELL
>     cf:1                      timestamp=1000, value=One
>     cf:2                      timestamp=2000, value=Two
>     cf:3                      timestamp=3000, value=Three
>    3 row(s) in 0.0150 seconds
>
>Then the delete of a cell at a later timestamp:
>
>> delete 't1', 'rowid', 'cf:1', 4000
>
>Now if I retrieve the row at time 3500 I would find it logical that I would
>still see the same values as I would above.
>This is however the reality:
>
>> get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>    COLUMN                     CELL
>     cf:2                      timestamp=2000, value=Two
>     cf:3                      timestamp=3000, value=Three
>    2 row(s) in 0.0120 seconds
>
>
>Why has it been designed/implemented like this?
>What is the logic behind this model?
>
>--
>Best regards / Met vriendelijke groeten,
>
>Niels Basjes

Re: Why does a delete behave like this?

Reply via email to