Sorry for chiming in late. Are you sure you want to use Delete.deleteColumn and not Delete.deleteColumns (note the plural form). deleteColumn marks a single version of a column (of a CF of a Row) for deletion deleteColumns marks all versions of a column as deleted (unless you specify a timestamp).
deleteColumns is what you want in most cases unless you carefully have to control individual version of a specific column in a specific row. -- Lars ________________________________ From: Ted Tuttle <[email protected]> To: [email protected] Cc: Development <[email protected]> Sent: Thursday, June 21, 2012 7:02 AM Subject: RE: RS unresponsive after series of deletes Good hint, Ted By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn w/o timestamp, the time to delete row keys is reduced by 95%. I am going to experiment w/ limited batches of Deletes, too. Thanks everyone for help on this one. -----Original Message----- From: Ted Yu [mailto:[email protected]] Sent: Wednesday, June 20, 2012 10:13 PM To: [email protected] Subject: Re: RS unresponsive after series of deletes As I mentioned earlier, prepareDeleteTimestamps() performs one get operation per column qualifier: get.addColumn(family, qual); List<KeyValue> result = get(get, false); This is too costly in your case. I think you can group some configurable number of qualifiers in each get and perform classification on result. This way we can reduce the number of times HRegion$RegionScannerImpl.next() is called. Cheers On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle <[email protected]>wrote: > > Do your 100s of thousands cell deletes overlap (in terms of column > family) > > across rows ? > > Our schema contains only one column family per table. So, each Delete > contains cells from a single column family. I hope this answers your > question.
