First off, J-D, thanks for helping me work through this. You've inspired some different angles and I think I've finally made it bleed in a controlled way.
> - That data you are deleting needs to be read when you scan, like I > said earlier a delete is in fact an insert in HBase and this isn't > cleared up until a major compaction happens. I manually compacted (via UI) the table that I deleted from. The scan times are still >10min. When reading through each node's log, I see some messages indicating the major compactions were going to be skipped. Is it safe to say that hitting that 'Compact' button is just a recommendation? Is there an operation we can perform after a big delete to guarantee that deletes get compacted away? > Do you have scanner caching turned on? Just to be sure set > scan.setCaching(1) and see if it makes any difference. A bit confused here. Under what conditions would you recommend setting the scan caching to 1? My read path doesn't know about whether a lot of data was recently deleted so I can't disable it conditionally. I want scan caching in general, I believe. > Are you saying that you have Delete objects on which you did > deleteColumn() 1000x? If so, look no further there's your problem. I am calling deleteColumn() thousands of time per Delete object. I can delete a row w/ 20k keys in ~2 sec. If I issue 10 of these (they appear to fired off asynchronously by the client), the unresponsive RS behavior ensues. Here is a stack dump from a RS that is running at >90% utilization as it processes my deletes: http://pastebin.com/8y5x4xU7 Some logs around this time: http://pastebin.com/UpPMbsmn So, my takeaway is the RS don't like being slammed w/ 100s of thousands cell deletes. I can be more measured about these deletes going forward. That the RSs don't handle this more gracefully sounds like a bug. At a minimum, there appears to be a nonlinear response. What do you think?
