Re: How to config hbase0.94.2 to retain deleted data

lars hofhansl Tue, 23 Oct 2012 11:48:17 -0700

Maybe this should be stated more clearly in the documentation.

If you need to perform time range queries (Scans/Gets as of time T) and you 
want those to be correct even when data was marked for delete you need this 
enabled.
If you do not care about the history of your data or you do not delete data you 
won't need it.


This has nothing to do with how the cells were marked for delete (by the entire 
column family, column, or version). Versioning is done per cell in HBase.



________________________________
 From: Michael Segel <[email protected]>
To: [email protected]; lars hofhansl <[email protected]> 
Sent: Tuesday, October 23, 2012 11:40 AM
Subject: Re: How to config hbase0.94.2 to retain deleted data
 
Lars, 

No, that is not what I am suggesting. 

Perhaps I am missing something. Was the OP interested in cells or in row 
deletes.?

Two different issues. 

On Oct 23, 2012, at 1:35 PM, lars hofhansl <[email protected]> wrote:

> HBase has time range queries. You can say "give me the data as of time T" or 
> "give me the data between X and Y". How far back you want to retain your data 
> is specified via TTL and VERSIONS.
> 
> But... If you delete the data at T+X (X>0), a query as of time T won't return 
> anything, even though at T the data was still there.
> 
> If you don't use TTL and/or VERSIONS in HBase you won't need this feature.
> 
> If you do use these you're doing so because you want get to the older data. 
> And you delete stuff, chances are you want KEEP_DELETED_CELLS enabled.
> So within the boundaries specified by TTL/VERSIONS you can get to the data as 
> of any time.
> 
> 
> By your logic nobody should use TTL/VERSIONS, which is nonsense.
> 
> 
> 
> ________________________________
> From: Michael Segel <[email protected]>
> To: lars hofhansl <[email protected]> 
> Cc: "[email protected]" <[email protected]> 
> Sent: Tuesday, October 23, 2012 4:41 AM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> "Deleted cells are still subject to TTL and there will never be more than 
> "maximum number of versions" deleted cells. A new "raw" scan options returns 
> all deleted rows and the delete markers. "
> 
> This is different from the idea suggested by the OP. Here deleted cells still 
> get deleted. Just that when the compaction flag comes along, its told to 
> ignore them. 
> 
> So if I say a column can have 3 versions (cells) then if I insert another 
> value for that row:column key, I push that deleted cell down the stack.  
> Enough times, its gone. 
> 
> In theory, this feature would be useful if I wanted an OLTP implementation on 
> top of HBase. It would allow the transaction to bridge a compaction cycle. 
> However, that's pretty much it. 
> 
> This feature doesn't translate well beyond this. 
> 
> It also begs the following:  How do I handle a long transaction (OLTP)  
> timeouts, and isolation levels? 
> 
> If you look at this at the row level... definitely not a good idea. Think of 
> fat clogging an artery.
>  
> On Oct 23, 2012, at 12:22 AM, lars hofhansl <[email protected]> wrote:
> 
>> http://hbase.apache.org/book/cf.keep.deleted.html
>> 
>> Without it you cannot do correct as-of-time queries when it comes to deletes.
>> 
>> -- Lars
>> 
>> From: Michael Segel <[email protected]>
>> To: [email protected]; lars hofhansl <[email protected]> 
>> Sent: Monday, October 22, 2012 9:18 PM
>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells 
>>> feature?
>>> (It might well be, just curious)
>> 
>> Ok... so what exactly does this feature mean? 
>> 
>> Suppose I have 500 rows within a region. I set this feature to be true. 
>> I do a massive delete and there are only 50 rows left standing. 
>> 
>> So if I do a count of the number of rows in the region, I see only 50, yet 
>> if I compact the table, its still full. 
>> 
>> Granted I'm talking about rows and not cells, but the idea is the same. IMHO 
>> you're asking for more headaches that you solve. 
>> 
>> KISS would suggest that moving deleted data in to a different table would 
>> yield better performance in the long run. 
>> 
>> 
>> On Oct 21, 2012, at 7:23 PM, lars hofhansl <[email protected]> wrote:
>> 
>>> That'd work too. Requires the regionservers to make remote updates to other 
>>> regionservers, though. And you have to trap each and every change (Put, 
>>> Delete, Increment, Append, RowMutations, etc)
>>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells 
>>> feature?
>>> (It might well be, just curious)
>>> 
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Michael Segel <[email protected]>
>>> To: [email protected]
>>> Cc: 
>>> Sent: Sunday, October 21, 2012 4:34 PM
>>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>>> 
>>> I would suggest that you use your coprocessor to copy the data to a 
>>> 'backup' table when you mark them for delete. 
>>> Then as major compaction hits, the rows are deleted from the main table, 
>>> but still reside undeleted in your delete table. 
>>> Call it a history table. 
>>> 
>>> 
>>> On Oct 21, 2012, at 3:53 PM, yun peng <[email protected]> wrote:
>>> 
>>>> Hi, All,
>>>> I want to retain all deleted key-value pairs in hbase. I have tried to
>>>> config HColumnDescript as follow to make it return deleted.
>>>> 
>>>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e) {
>>>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>>>     hcd.setKeepDeletedCells(true);
>>>>     hcd.setBlockCacheEnabled(false);
>>>>   }
>>>> 
>>>> However, it does not work for me, as when I issued a delete and then query
>>>> by an older timestamp, the old data does not show up.
>>>> 
>>>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>>>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>>>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>>>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 99, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0040 seconds
>>>> 
>>>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 100, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0050 seconds
>>>> 
>>>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 101, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> cf:c1                timestamp=101, value=v2
>>>> 
>>>> 1 row(s) in 0.0050 seconds
>>>> 
>>>> Note this is a new feature in 0.94.2
>>>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>>>> I did not find too many sample code online, so... any one here has
>>>> experience in using HBASE-4536. How should one config
>>>> hbase to enable this feature in hbase?
>>>> 
>>>> Thanks
>>>> Yun
>>> 
>> 
>>

Re: How to config hbase0.94.2 to retain deleted data

Reply via email to