Thanks for the suggestion. I did use List<Delete> with size 1000, actually the 
performance was not that different from deleting one row at a time.
I investigated HRegion.delete() method, my understanding is that when you call 
delete() to delete a row, it's actually going to delete all the column families 
for that row first, meaning it'll put tombstone to each family column.
In my case each row has 5 family columns, that means each delete will result in 
putting 5 tombstones to the row, I am thinking that could be the reason why 
delete is so slow.

I  am just wondering if there's anyway or tools we can profile a hbase 
application to measure the time taken on each individual methods.

Haijia

-----Original Message-----
From: Doug Meil [mailto:[email protected]] 
Sent: Tuesday, February 21, 2012 8:54 PM
To: [email protected]
Subject: Re: hbase delete operation is very slow


I don't think write-buffering is an option because that's Put-only the last 
time I looked, but the advice I put in the book is to use the 
delete(List<Delete>).  He'll have to keep track of the List<Delete> himself and 
determine when the batch should be sent, but it's a lot better than one at a 
time.




On 2/21/12 7:39 PM, "Stack" <[email protected]> wrote:

>On Tue, Feb 21, 2012 at 2:45 PM, Doug Meil 
><[email protected]> wrote:
>>
>> Hi there-
>>
>> You probably want to see this...
>>
>> http://hbase.apache.org/book.html#perf.deleting
>>
>> .. that particular method doesn't use the write-buffer and is 
>> submitting deletes one-by-one to the RS's.
>>
>>
>
>Do what Doug suggests.  Sounds like you are setting up a Map per row 
>and then per row, figuring whether to Delete.  If a Delete, you do an 
>invocation per.  Where are you getting your table instance from?  Is it 
>created each time?  And as per Doug, are you write buffering your 
>deletes?
>
>St.Ack
>


Reply via email to