RE: RS unresponsive after series of deletes

Ted Tuttle Thu, 14 Jun 2012 12:37:23 -0700

> What kind of a delete are you doing?  

A mixture of row and cell deletes.  Interestingly, the first 19
(successful) deletes were row deletes.  The client got hung up while
submitting its first batch of cell deletes.  However, I think the
cell/row distinction is a red herring as we've experienced this behavior
at least once with batches of exclusively row deletes.


> When you say 19 deletes, each of these is a batch delete?

Each of the 19 deletes is a call to HTable.delete(List<Delete>).  I
estimated there where about 144 Deletes in each batch.  In the cell
delete that failed, I estimate about 1000 column qualifiers per row for
a total of about 144k cells per batch. 
  
> Could it be that a batch is doing a bunch at the one time and taking a
long time to complete?  

In order to issue the cell delete we scan each row's column keys for
matches to in-memory set of domain objects.  The code to construct the
delete is completing quickly.

I should add that most of our deletes are very fast. But on 3 occasions
thus far, they exceed 10min allotted by retry logic in client.

> Try making smaller batches?  Want to try thread dumping it when it
goes unresponsive?

I will try to reproduce w/ test harness.
  
> Do you have gc logging enabled?  Anything in the .out file at this
time when we are using CPU?

I don't see any GC related operations over 10s.  Here is log from time
of first failure to 20min after: http://pastebin.com/AUaULHcD

-Ted

RE: RS unresponsive after series of deletes

Reply via email to