> What kind of a delete are you doing? A mixture of row and cell deletes. Interestingly, the first 19 (successful) deletes were row deletes. The client got hung up while submitting its first batch of cell deletes. However, I think the cell/row distinction is a red herring as we've experienced this behavior at least once with batches of exclusively row deletes.
> When you say 19 deletes, each of these is a batch delete? Each of the 19 deletes is a call to HTable.delete(List<Delete>). I estimated there where about 144 Deletes in each batch. In the cell delete that failed, I estimate about 1000 column qualifiers per row for a total of about 144k cells per batch. > Could it be that a batch is doing a bunch at the one time and taking a long time to complete? In order to issue the cell delete we scan each row's column keys for matches to in-memory set of domain objects. The code to construct the delete is completing quickly. I should add that most of our deletes are very fast. But on 3 occasions thus far, they exceed 10min allotted by retry logic in client. > Try making smaller batches? Want to try thread dumping it when it goes unresponsive? I will try to reproduce w/ test harness. > Do you have gc logging enabled? Anything in the .out file at this time when we are using CPU? I don't see any GC related operations over 10s. Here is log from time of first failure to 20min after: http://pastebin.com/AUaULHcD -Ted
