Ted T: Can you log a JIRA summarizing the issue ? I feel HBase should provide better handling for cell deletion of very wide rows intrinsically - without user tweaking timestamp.
On Thu, Jun 21, 2012 at 7:02 AM, Ted Tuttle <[email protected]>wrote: > Good hint, Ted > > By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn > w/o timestamp, the time to delete row keys is reduced by 95%. > > I am going to experiment w/ limited batches of Deletes, too. > > Thanks everyone for help on this one. > > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: Wednesday, June 20, 2012 10:13 PM > To: [email protected] > Subject: Re: RS unresponsive after series of deletes > > As I mentioned earlier, prepareDeleteTimestamps() performs one get > operation per column qualifier: > get.addColumn(family, qual); > > List<KeyValue> result = get(get, false); > This is too costly in your case. > I think you can group some configurable number of qualifiers in each get > and perform classification on result. > This way we can reduce the number of times > HRegion$RegionScannerImpl.next() > is called. > > Cheers > > On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle > <[email protected]>wrote: > > > > Do your 100s of thousands cell deletes overlap (in terms of column > > family) > > > across rows ? > > > > Our schema contains only one column family per table. So, each Delete > > contains cells from a single column family. I hope this answers your > > question. >
