Re: RS unresponsive after series of deletes

Stack Thu, 14 Jun 2012 10:39:24 -0700

On Wed, Jun 13, 2012 at 12:09 PM, Ted Tuttle
<[email protected]> wrote:
> My client code has a set of deletes to carry out.  After successfully issuing 
> 19 such deletes the client begins logging HBase errors while trying to 
> complete the deletes.  It logs ERRORs every 60s for 10 times and then gives 
> up.
>


What kind of a delete are you doing?  You are deleting individual
cells?  When you say 19 deletes, each of these is a batch delete?  If
a cell delete, we need to read the cell first to find the most recent
timestamp.  Looks like we are timing out the rpc doing your batch of
deletes.  Could it be that a batch is doing a bunch at the one time
and taking a long time to complete?  Try making smaller batches?
(Delete of 144 rows taking a minute seems like way too long though, or
is the delete of a row made up of many individual deletes?  A delete
of a column family on a row is cheaper than cell delete because just
puts a marker on the column family -- See
http://hbase.apache.org/book.html#version.delete).

> Ultimately, the RS became responsive again. Looking at monitoring I see spike 
> in CPU utilization on node that is unresponsive; it goes from 2% utilization 
> to 20% and sticks there for a few minutes.  None of the other nodes in the 
> cluster appear busy at this time.
>

Want to try thread dumping it when it goes unresponsive?  That'd help
us figure what the regionserver was doing at the time when its burning
20% (Do you have gc logging enabled?  Anything in the .out file at
this time when we are using CPU?)

St.Ack

Re: RS unresponsive after series of deletes

Reply via email to