You are correct, since we do not prune extra version except during
these major compactions that happen about once a day, if you delete a
recent version and it exposes an older version, you will see this.

I might consider this a mis-feature.  I would encourage you to
consider using the Delete.deleteColumns() call found here:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumns(byte[],
byte[])

and NOT USE:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumn(byte[],
byte[])

Note the only difference between these is the plurality of 'column'.

I hope this helps!
-ryan

On Mon, Jan 31, 2011 at 4:35 PM, Buttler, David <[email protected]> wrote:
> The way I understand it is that old versions do not actually disappear until 
> a compaction occurs.  A compaction should occur once per day unless you have 
> changed the major compaction settings, or whenever a region splits.
>
> Dave
>
>
>
> -----Original Message-----
> From: Mike Percy [mailto:[email protected]]
> Sent: Friday, January 28, 2011 6:10 PM
> To: [email protected]
> Subject: Re: Delete reveals older version of a column even when VERSIONS=1
>
> Hmm... how does this relate to setting VERSIONS => '1'? By setting # of 
> versions to 1 are we getting some space benefit over say VERSIONS => '10'?
>
> Thanks,
> Mike
>
> On Jan 28, 2011, at 5:47 PM, Ryan Rawson wrote:
>
>> I would call it 'a surprising, perhaps unexpected consequence of our
>> storage model'.
>>
>> There are 2 types of deletes in hbase, you are doing type (a) "delete
>> a single version", but you probably want type (b) "delete all versions
>> in this column"
>>
>>
>>
>> On Fri, Jan 28, 2011 at 5:43 PM, Mike Percy <[email protected]> wrote:
>>> Hi folks,
>>> I am seeing some unexpected behavior with HBase 0.20.6 when deleting 
>>> columns. Our cluster has been running for some time however we recently 
>>> upgraded from Hbase 0.20.3. The family I am writing to is specified as 
>>> VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining 
>>> several versions of the columns.
>>>
>>> Below is a shell session demonstrating the problem. Is this a configuration 
>>> problem, as-designed, or possibly a bug?
>>>
>>> Thanks,
>>> Mike
>>>
>>> hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
>>> 0 row(s) in 0.0110 seconds
>>> hbase(main):007:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264772717, value=1
>>> 1 row(s) in 0.0080 seconds
>>> hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
>>> 0 row(s) in 0.0020 seconds
>>> hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
>>> 0 row(s) in 0.0020 seconds
>>> hbase(main):010:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264797169, value=3
>>> 1 row(s) in 0.0030 seconds
>>> hbase(main):011:0> delete 'table', 'row', 'family:qual'
>>> 0 row(s) in 0.0040 seconds
>>> hbase(main):012:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264795365, value=2
>>> 1 row(s) in 0.0630 seconds
>>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>>> 0 row(s) in 0.0360 seconds
>>> hbase(main):014:0> get 'table', 'row'
>>> COLUMN                       CELL
>>>  family:qual                   timestamp=1296264772717, value=1
>>> 1 row(s) in 0.0030 seconds
>>> hbase(main):013:0> delete 'table', 'row', 'family:qual'
>>> 0 row(s) in 0.0360 seconds
>>> hbase(main):016:0> get 'table', 'row'
>>> COLUMN                       CELL
>>> 0 row(s) in 0.0030 seconds
>>>
>>>
>
>

Reply via email to