Now that my cluster appears to run smoothly and after a few successful repairs and compacts, I'm back in the business of deletion of portions of data based on its date of insertion. For reasons too lengthy to be explained here, I don't want to use TTL.
I use a batch mutator in Pycassa to delete ~1M rows based on a longish list of keys I'm extracting from an auxiliary CF (with no problem of any sort). Now, it appears that such heads-on delete puts a temporary but large load on the cluster. I have SSD's and they go to 100% utilization, and the CPU spikes to significant loads. Does anyone do throttling on such mass-delete procedure? Thanks in advance, Maxim