On Wed, Jun 13, 2012 at 12:09 PM, Ted Tuttle <[email protected]> wrote: > My client code has a set of deletes to carry out. After successfully issuing > 19 such deletes the client begins logging HBase errors while trying to > complete the deletes. It logs ERRORs every 60s for 10 times and then gives > up. >
What kind of a delete are you doing? You are deleting individual cells? When you say 19 deletes, each of these is a batch delete? If a cell delete, we need to read the cell first to find the most recent timestamp. Looks like we are timing out the rpc doing your batch of deletes. Could it be that a batch is doing a bunch at the one time and taking a long time to complete? Try making smaller batches? (Delete of 144 rows taking a minute seems like way too long though, or is the delete of a row made up of many individual deletes? A delete of a column family on a row is cheaper than cell delete because just puts a marker on the column family -- See http://hbase.apache.org/book.html#version.delete). > Ultimately, the RS became responsive again. Looking at monitoring I see spike > in CPU utilization on node that is unresponsive; it goes from 2% utilization > to 20% and sticks there for a few minutes. None of the other nodes in the > cluster appear busy at this time. > Want to try thread dumping it when it goes unresponsive? That'd help us figure what the regionserver was doing at the time when its burning 20% (Do you have gc logging enabled? Anything in the .out file at this time when we are using CPU?) St.Ack
