Does it work? :)

How did you do the deletes before?I assume you used the 
HTable.delete(List<Delete>) API?

(Doesn't really help you, but) In 0.92+ you could hook up a coprocessor into 
the compactions and simply filter out any KVs you want to have removed.


-- Lars



________________________________
 From: Paul Mackles <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, October 5, 2012 11:17 AM
Subject: bulk deletes
 
We need to do deletes pretty regularly and sometimes we could have hundreds of 
millions of cells to delete. TTLs won't work for us because we have a fair 
amount of bizlogic around the deletes.

Given their current implemention  (we are on 0.90.4), this delete process can 
take a really long time (half a day or more with 100 or so concurrent threads). 
From everything I can tell, the performance issues come down to each delete 
being an individual RPC call (even when using the batch API). In other words, I 
don't see any thrashing on hbase while this process is running – just lots of 
waiting for the RPC calls to return.

The alternative we came up with is to use the standard bulk load facilities to 
handle the deletes. The code turned out to be surpisingly simple and appears to 
work in the small-scale tests we have tried so far. Is anyone else doing 
deletes in  this fashion? Are there drawbacks that I might be missing? Here is 
a link to the code:

https://gist.github.com/3841437

Pretty simple, eh? I haven't seen much mention of this technique which is why I 
am a tad paranoid about it.

Thanks,
Paul

Reply via email to