How often do you need to perform such delete operation ? Is there way to utilize ttl so that you can avoid deletions ?
Pardon me for not knowing your use case very well. On Feb 8, 2013, at 10:16 PM, Varun Sharma <[email protected]> wrote: > Using hbase 0.94.3. Tried that too, ran into performance issues with having > to retrieve the entire row first (this was getting slow when one particular > row is hammered) since row can be big (few megs, some times 10s of megs) > and then finding the columns and then doing a delete. > > To me, it looks like the current implementation of deleteColumn is > suboptimal because of the 300 gets vs doing 1. > > Thanks > Varun > > On Fri, Feb 8, 2013 at 10:09 PM, Ted Yu <[email protected]> wrote: > >> Which HBase version are you using ? >> >> Is there a way to place 10 delete markers from application side instead of >> 300 ? >> >> Thanks >> >> On Fri, Feb 8, 2013 at 10:05 PM, Varun Sharma <[email protected]> wrote: >> >>> We are given a set of 300 columns to delete. I tested two cases: >>> >>> 1) deleteColumns() - with the 's' >>> >>> This function simply adds delete markers for 300 columns, in our case, >>> typically only a fraction of these columns are actually present - 10. >> After >>> starting to use deleteColumns, we starting seeing a drop in cluster wide >>> random read performance - 90th percentile latency worsened, so did 99th >>> probably because of having to traverse delete markers. I attribute this >> to >>> profusion of delete markers in the cluster. Major compactions slowed down >>> by almost 50 percent probably because of having to clean out >> significantly >>> more delete markers. >>> >>> 2) deleteColumn() >>> >>> Ended up with untolerable 15 second calls, which clogged all the >> handlers. >>> Making the cluster pretty much unresponsive. >>> >>> On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <[email protected]> wrote: >>> >>>> For the 300 column deletes, can you show us how the Delete(s) are >>>> constructed ? >>>> >>>> Do you use this method ? >>>> >>>> public Delete deleteColumns(byte [] family, byte [] qualifier) { >>>> Thanks >>>> >>>> On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <[email protected]> >>> wrote: >>>> >>>>> So a Get call with multiple columns on a single row should be much >>> faster >>>>> than independent Get(s) on each of those columns for that row. I am >>>>> basically seeing severely poor performance (~ 15 seconds) for certain >>>>> deleteColumn() calls and I am seeing that there is a >>>>> prepareDeleteTimestamps() function in HRegion.java which first tries >> to >>>>> locate the column by doing individual gets on each column you want to >>>>> delete (I am doing 300 column deletes). Now, I think this should >> ideall >>>> by >>>>> 1 get call with the batch of 300 columns so that one scan can >> retrieve >>>> the >>>>> columns and the columns that are found, are indeed deleted. >>>>> >>>>> Before I try this fix, I wanted to get an opinion if it will make a >>>>> difference to batch the get() and it seems from your answer, it >> should. >>>>> >>>>> On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[email protected]> >>> wrote: >>>>> >>>>>> Everything is stored as a KeyValue in HBase. >>>>>> The Key part of a KeyValue contains the row key, column family, >>> column >>>>>> name, and timestamp in that order. >>>>>> Each column family has it's own store and store files. >>>>>> >>>>>> So in a nutshell a get is executed by starting a scan at the row >> key >>>>>> (which is a prefix of the key) in each store (CF) and then scanning >>>>> forward >>>>>> in each store until the next row key is reached. (in reality it is >> a >>>> bit >>>>>> more complicated due to multiple versions, skipping columns, etc) >>>>>> >>>>>> >>>>>> -- Lars >>>>>> ________________________________ >>>>>> From: Varun Sharma <[email protected]> >>>>>> To: [email protected] >>>>>> Sent: Friday, February 8, 2013 9:22 PM >>>>>> Subject: Re: Get on a row with multiple columns >>>>>> >>>>>> Sorry, I was a little unclear with my question. >>>>>> >>>>>> Lets say you have >>>>>> >>>>>> Get get = new Get(row) >>>>>> get.addColumn("1"); >>>>>> get.addColumn("2"); >>>>>> . >>>>>> . >>>>>> . >>>>>> >>>>>> When internally hbase executes the batch get, it will seek to >> column >>>> "1", >>>>>> now since data is lexicographically sorted, it does not need to >> seek >>>> from >>>>>> the beginning to get to "2", it can continue seeking, henceforth >>> since >>>>>> column "2" will always be after column "1". I want to know whether >>> this >>>>> is >>>>>> how a multicolumn get on a row works or not. >>>>>> >>>>>> Thanks >>>>>> Varun >>>>>> >>>>>> On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[email protected]> >> wrote: >>>>>> >>>>>>> Like Ishan said, a get give an instance of the Result class. >>>>>>> All utility methods that you can use are: >>>>>>> byte[] getValue(byte[] family, byte[] qualifier) >>>>>>> byte[] value() >>>>>>> byte[] getRow() >>>>>>> int size() >>>>>>> boolean isEmpty() >>>>>>> KeyValue[] raw() # Like Ishan said, all data here is sorted >>>>>>> List<KeyValue> list() >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 02/08/2013 11:29 PM, Ishan Chhabra wrote: >>>>>>> >>>>>>>> Based on what I read in Lars' book, a get will return a result a >>>>> Result, >>>>>>>> which is internally a KeyValue[]. This KeyValue[] is sorted by >> the >>>> key >>>>>> and >>>>>>>> you access this array using raw or list methods on the Result >>>> object. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma < >> [email protected] >>>> >>>>>> wrote: >>>>>>>> >>>>>>>> +user >>>>>>>>> >>>>>>>>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma < >>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> When I do a Get on a row with multiple column qualifiers. Do >> we >>>> sort >>>>>> the >>>>>>>>>> column qualifers and make use of the sorted order when we get >>> the >>>>>>>>> results ? >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Varun >>>>>>> -- >>>>>>> Marcos Ortiz Valmaseda, >>>>>>> Product Manager && Data Scientist at UCI >>>>>>> Blog: http://marcosluis2186.**posterous.com< >>>>>> http://marcosluis2186.posterous.com> >>>>>>> Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186< >>>>>> http://twitter.com/marcosluis2186> >>
