You are correct, since we do not prune extra version except during these major compactions that happen about once a day, if you delete a recent version and it exposes an older version, you will see this.
I might consider this a mis-feature. I would encourage you to consider using the Delete.deleteColumns() call found here: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumns(byte[], byte[]) and NOT USE: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumn(byte[], byte[]) Note the only difference between these is the plurality of 'column'. I hope this helps! -ryan On Mon, Jan 31, 2011 at 4:35 PM, Buttler, David <[email protected]> wrote: > The way I understand it is that old versions do not actually disappear until > a compaction occurs. A compaction should occur once per day unless you have > changed the major compaction settings, or whenever a region splits. > > Dave > > > > -----Original Message----- > From: Mike Percy [mailto:[email protected]] > Sent: Friday, January 28, 2011 6:10 PM > To: [email protected] > Subject: Re: Delete reveals older version of a column even when VERSIONS=1 > > Hmm... how does this relate to setting VERSIONS => '1'? By setting # of > versions to 1 are we getting some space benefit over say VERSIONS => '10'? > > Thanks, > Mike > > On Jan 28, 2011, at 5:47 PM, Ryan Rawson wrote: > >> I would call it 'a surprising, perhaps unexpected consequence of our >> storage model'. >> >> There are 2 types of deletes in hbase, you are doing type (a) "delete >> a single version", but you probably want type (b) "delete all versions >> in this column" >> >> >> >> On Fri, Jan 28, 2011 at 5:43 PM, Mike Percy <[email protected]> wrote: >>> Hi folks, >>> I am seeing some unexpected behavior with HBase 0.20.6 when deleting >>> columns. Our cluster has been running for some time however we recently >>> upgraded from Hbase 0.20.3. The family I am writing to is specified as >>> VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining >>> several versions of the columns. >>> >>> Below is a shell session demonstrating the problem. Is this a configuration >>> problem, as-designed, or possibly a bug? >>> >>> Thanks, >>> Mike >>> >>> hbase(main):004:0> put 'table', 'row', 'family:qual', '1' >>> 0 row(s) in 0.0110 seconds >>> hbase(main):007:0> get 'table', 'row' >>> COLUMN CELL >>> family:qual timestamp=1296264772717, value=1 >>> 1 row(s) in 0.0080 seconds >>> hbase(main):008:0> put 'table', 'row', 'family:qual', '2' >>> 0 row(s) in 0.0020 seconds >>> hbase(main):009:0> put 'table', 'row', 'family:qual', '3' >>> 0 row(s) in 0.0020 seconds >>> hbase(main):010:0> get 'table', 'row' >>> COLUMN CELL >>> family:qual timestamp=1296264797169, value=3 >>> 1 row(s) in 0.0030 seconds >>> hbase(main):011:0> delete 'table', 'row', 'family:qual' >>> 0 row(s) in 0.0040 seconds >>> hbase(main):012:0> get 'table', 'row' >>> COLUMN CELL >>> family:qual timestamp=1296264795365, value=2 >>> 1 row(s) in 0.0630 seconds >>> hbase(main):013:0> delete 'table', 'row', 'family:qual' >>> 0 row(s) in 0.0360 seconds >>> hbase(main):014:0> get 'table', 'row' >>> COLUMN CELL >>> family:qual timestamp=1296264772717, value=1 >>> 1 row(s) in 0.0030 seconds >>> hbase(main):013:0> delete 'table', 'row', 'family:qual' >>> 0 row(s) in 0.0360 seconds >>> hbase(main):016:0> get 'table', 'row' >>> COLUMN CELL >>> 0 row(s) in 0.0030 seconds >>> >>> > >
