Thanks for the suggestion. I did use List<Delete> with size 1000, actually the performance was not that different from deleting one row at a time. I investigated HRegion.delete() method, my understanding is that when you call delete() to delete a row, it's actually going to delete all the column families for that row first, meaning it'll put tombstone to each family column. In my case each row has 5 family columns, that means each delete will result in putting 5 tombstones to the row, I am thinking that could be the reason why delete is so slow.
I am just wondering if there's anyway or tools we can profile a hbase application to measure the time taken on each individual methods. Haijia -----Original Message----- From: Doug Meil [mailto:[email protected]] Sent: Tuesday, February 21, 2012 8:54 PM To: [email protected] Subject: Re: hbase delete operation is very slow I don't think write-buffering is an option because that's Put-only the last time I looked, but the advice I put in the book is to use the delete(List<Delete>). He'll have to keep track of the List<Delete> himself and determine when the batch should be sent, but it's a lot better than one at a time. On 2/21/12 7:39 PM, "Stack" <[email protected]> wrote: >On Tue, Feb 21, 2012 at 2:45 PM, Doug Meil ><[email protected]> wrote: >> >> Hi there- >> >> You probably want to see this... >> >> http://hbase.apache.org/book.html#perf.deleting >> >> .. that particular method doesn't use the write-buffer and is >> submitting deletes one-by-one to the RS's. >> >> > >Do what Doug suggests. Sounds like you are setting up a Map per row >and then per row, figuring whether to Delete. If a Delete, you do an >invocation per. Where are you getting your table instance from? Is it >created each time? And as per Doug, are you write buffering your >deletes? > >St.Ack >
